Discussion about this post

User's avatar
Yaroslav Bulatov's avatar

Derivatives in terms of Kronecker products is a bad idea. I say this after reading Magnus/Nuedecker and using their notation for a couple of years. Turning everything into a matrix adds a "flattening" step, and your formulas will be different depending on whether you use row-vec or col-vec operator to flatten. Math convention is col-vec, but GPU tensor layout is row-vec, so you end up with "wrong order" formulas propagating into slow implementations (ie, in https://github.com/tensorflow/kfac) . Alternative is to keep the indices. Derivative w.r.t. matrix variable has 2 indices. Hessian has 4 indices. If you don't want to come up with index names, can use graphical notation as in https://github.com/thomasahle/tensorgrad

Expand full comment
Alex's avatar

Amazing post. I am pretty mediocre at Linear Algebra but know it is a "magical" skill (almost like knowing how to code). The more resources, the better. Thanks!

Expand full comment
17 more comments...

No posts