Discussion about this post

User's avatar
Francesco Orabona's avatar

FYI the deterministic identity you derived is a special case of Theorem 1 in https://arxiv.org/pdf/1112.1390 for the one dimensional case with constant covariates equal to 1 and taking the limit of "a" to 0. So, a similar equality holds more generally when you use online ridge regression. You also have the very same equality between marginal likelihood and predictive distributions in Gaussian Processes.

Expand full comment
Daniel Russo's avatar

Great post and cool identity.

An awkward bit is to explain "(a) the rules stated that you had to make the same prediction for all of these bits" when you really employ an online rule that does make evolving predictions.

I personally find the assumption of exchangeability to be a pretty intuitive match for "all missing outcomes are indistinguishable from each other -- both apriori and conditioned on anything you've seen so far." Online learning papers tend to acts like this is true, but shy away from stating it outright.

Expand full comment
4 more comments...

No posts