Discussion about this post

User's avatar
Yaroslav Bulatov's avatar

Derivatives in terms of Kronecker products is a bad idea. I say this after reading Magnus/Nuedecker and using their notation for a couple of years. Turning everything into a matrix adds a "flattening" step, and your formulas will be different depending on whether you use row-vec or col-vec operator to flatten. Math convention is col-vec, but GPU tensor layout is row-vec, so you end up with "wrong order" formulas propagating into slow implementations (ie, in https://github.com/tensorflow/kfac) . Alternative is to keep the indices. Derivative w.r.t. matrix variable has 2 indices. Hessian has 4 indices. If you don't want to come up with index names, can use graphical notation as in https://github.com/thomasahle/tensorgrad

Expand full comment
Mahyar Fazlyab's avatar

I can completely relate to this as I am teaching convex optimization this semester. I was proving the concavity of logdet the other day. Everyone was completely lost even math students.

After teaching convex optimization over the past four years, I have identified a trend: students background in linear algebra is getting worse every year; and they are more inclined to know the end application before caring about the foundations.

This course is getting more and more challenging to teach! Would appreciate your suggestions.

Expand full comment
17 more comments...

No posts