Yes, neural network training does indeed have fascinating non-linear optimization dynamics, if you know what to look for: https://centralflows.github.io/part1/.
"we always pick the model that has the lowest test error, regardless of whether we have a theoretical justification for the method. In machine learning, we are allowed to do whatever we want to make the holdout error small. "
Wouldn't this only apply to a single set of data? If the data is continuing to be added to the holdout data set, then won't the "best" minimum error fluctuate, whether the ML is run again on teh total of the old and new data, or just the original model run against the old holdout data plus the new data?
Yes, neural network training does indeed have fascinating non-linear optimization dynamics, if you know what to look for: https://centralflows.github.io/part1/.
"we always pick the model that has the lowest test error, regardless of whether we have a theoretical justification for the method. In machine learning, we are allowed to do whatever we want to make the holdout error small. "
Wouldn't this only apply to a single set of data? If the data is continuing to be added to the holdout data set, then won't the "best" minimum error fluctuate, whether the ML is run again on teh total of the old and new data, or just the original model run against the old holdout data plus the new data?