Discussion about this post

User's avatar
Alex Tolley's avatar

"we always pick the model that has the lowest test error, regardless of whether we have a theoretical justification for the method. In machine learning, we are allowed to do whatever we want to make the holdout error small. "

Wouldn't this only apply to a single set of data? If the data is continuing to be added to the holdout data set, then won't the "best" minimum error fluctuate, whether the ML is run again on teh total of the old and new data, or just the original model run against the old holdout data plus the new data?

Expand full comment

No posts