Discussion about this post

User's avatar
JP's avatar

One comment about fitting random Fourier features (rff) models. You can fit them with linear scaling in the number of random features if you use an iterative method as you noted. You mentioned stochastic gradient descent, although the optimization problem for fitting these models is often ill-conditioned for obvious reasons and I haven't found stochastic gradient descent to work very well, at least not without extensive tweaking of the learning rate and learning rate schedule which is time consuming and kind of annoying. My favorite method for fitting rff models for regression (and classification if it makes sense to use LDA) is conjugate gradients with randomized Nystrom preconditioning, it works like magic and converges very quickly, and you can implement the randomized Nystrom preconditioner using a subsampled randomized Hadamard transform-based procedure that's quite fast. Makes them very practical and fairly easy to use if you do need to use a large number of random features!

Expand full comment
Ita's avatar

Thanks!

One comment: Solutions of the Kernel ridge regression lie in the span of the training set vectors, so even if you use kernel that corresponds to an infinite dimensional space, effectively your solution belongs to a finite dimensional space, the dimension is bounded by the size of the train set N.

Expand full comment
6 more comments...

No posts