I would love an explanation without the equations. I've been thinking about RL, at least in terms of the current slate of LLMs, as "optimization for a values system wherein nobody except machine learning engineers is able to select the values or has any say in why they are prioritized." Would love to know if that's an accurate description.
Yes do please
I would love an explanation without the equations. I've been thinking about RL, at least in terms of the current slate of LLMs, as "optimization for a values system wherein nobody except machine learning engineers is able to select the values or has any say in why they are prioritized." Would love to know if that's an accurate description.