6 Comments
Feb 24Liked by Ben Recht

Great post Ben! Really enjoy your clear explanations. I think (?) I recently encountered another example of this ill-conditioning effect in my work on optimal control in fisheries management and conservation problems, where the model preferred by the adaptive learning which makes the most accurate predictions still leads to a worse policy (both economically and ecologically) in what I referred to as 'the forecast trap', https://doi.org/10.1111/ele.14024.

Expand full comment

I am looking at various climate/weather models and associated policy recommendations. Do you think whatever you've blogging about in this series of posts have straightforward implications for climate change mitigation strategies debates?

Expand full comment
Feb 23Liked by Ben Recht

This seems somewhat related to the bias-variance tradeoff. Particularly Stein's paradox, where allowing bias in your estimates in high dimensions reduces variance and lowers overall error.

The example from your second 2020 blog where you describe certainty equivalence and its optimality under certain conditions was helpful, I had been trying to find references on that topic since your earlier post.

Expand full comment

> You can fix this by pretending your data is bad, or you can fix this by better understanding your broken model.

There's a 3rd option: curb your optimizer. The idea follows ET Jaynes: if you know that you're going to end up having a suboptimal policy (because of model uncertainty, limited data, early stopping, unknown unknowns, whatever) then don't optimize it too hard, thus avoiding the exact issue you raise here. In RL this has many names — max-entropy, energy-based, etc — but I like to call it Bounded RL. It can be done model-based (https://link.springer.com/chapter/10.1007/978-3-642-24647-0_3) or model-free (https://royf.org/pub/pdf/Fox2016Glearning.pdf), and is the principle behind some great algorithms like SAC.

Expand full comment