"We’ve collectively decided that the best strategies against random adversaries are those that maximize the expected value of the score. Don’t ask me why."
Obvious; the scheme proved itself by allowing Voltaire to set up a large (and successful) lottery scam.
Action selection is optimization. Feedback is parametric optimization. You can’t evaluate the solution until the parameters are known. In single agent processes, your future actions are nested parametric optimization problems. In LQ setting there is a closed form solution for that, hence why LQR has closed form for feedback policy. When not LQ, it’s okay, all the nested optimization problems have the same objective so you can collapse into a single problem (non convex). In multi agent settings it gets crazy 🤪 parametric games with all sorts of nested shit that can’t be collapsed. Fun to think about but not really practically useful
I’ve never thought of the “plant” in control as being a (potentially random) adversary but I can wrap my mind around it. Love the big picture view covering different fields- thanks!
Excellent, I love this type of post!
What type exactly? Hard to put my finger on it, but roughly - seeing multiple algorithms in a single framework in order to see where gaps remain.
I could not understand the figure without reading this post: https://www.argmin.net/p/predictions-and-actions-redux
(It wasn't clear what the action impact axis meant.)
Totally agree, that's my bad. I should have linked to that post rather than the table of contents. I made a small edit to this post to reference it.
"We’ve collectively decided that the best strategies against random adversaries are those that maximize the expected value of the score. Don’t ask me why."
Obvious; the scheme proved itself by allowing Voltaire to set up a large (and successful) lottery scam.
Action selection is optimization. Feedback is parametric optimization. You can’t evaluate the solution until the parameters are known. In single agent processes, your future actions are nested parametric optimization problems. In LQ setting there is a closed form solution for that, hence why LQR has closed form for feedback policy. When not LQ, it’s okay, all the nested optimization problems have the same objective so you can collapse into a single problem (non convex). In multi agent settings it gets crazy 🤪 parametric games with all sorts of nested shit that can’t be collapsed. Fun to think about but not really practically useful
I’ve never thought of the “plant” in control as being a (potentially random) adversary but I can wrap my mind around it. Love the big picture view covering different fields- thanks!