"Supervised machine learning was going to revolutionize prediction/inference for politics. What happened? [They] use "intrinsic dimension" to show it's v hard to beat simple (OLS, logit) models for polisci tabular data." I'd say social scientific data in general.
They conclude with: "That is, there clearly exist problems (e.g. Salganik et al.,2020) where all predictions are low quality, and the fact that a model is simple is cold comfort given our actual aims. In those cases, better and more (unstructured) data is the answer. I guess the million dollar question is: how does that work for these kinds of questions? "If granted admission, will a person succeed in law school? If released from prison, will a person recidivate? If a depressed person isn’t hospitalized, will they commit suicide? If a person receives shock therapy, will their depression be relieved?"
I need to make sure to address this tomorrow, because it's a major point of confusion.
OLS *is* machine learning. Just because simple ML methods perform better than complex ones does not mean that simple ML methods are inferior to clinical judgment.
The Salganik example is a great one: simple methods work. most of their cases are hard to predict. but the complex methods likely fail because there is massive missingness in the data, and it's hard to design around that.
Another example I'll discuss tomorrow is recidivism. There are very simple algorithms that do a good job of predicting recidivism. They almost always outperform complex psychometric methods (like the one that got infamous because of ProPublica).
This has been circulating today: https://x.com/arthur_spirling/status/1812875064205009277
"Supervised machine learning was going to revolutionize prediction/inference for politics. What happened? [They] use "intrinsic dimension" to show it's v hard to beat simple (OLS, logit) models for polisci tabular data." I'd say social scientific data in general.
They conclude with: "That is, there clearly exist problems (e.g. Salganik et al.,2020) where all predictions are low quality, and the fact that a model is simple is cold comfort given our actual aims. In those cases, better and more (unstructured) data is the answer. I guess the million dollar question is: how does that work for these kinds of questions? "If granted admission, will a person succeed in law school? If released from prison, will a person recidivate? If a depressed person isn’t hospitalized, will they commit suicide? If a person receives shock therapy, will their depression be relieved?"
Looking forward to the empirical evidence!
I need to make sure to address this tomorrow, because it's a major point of confusion.
OLS *is* machine learning. Just because simple ML methods perform better than complex ones does not mean that simple ML methods are inferior to clinical judgment.
The Salganik example is a great one: simple methods work. most of their cases are hard to predict. but the complex methods likely fail because there is massive missingness in the data, and it's hard to design around that.
Another example I'll discuss tomorrow is recidivism. There are very simple algorithms that do a good job of predicting recidivism. They almost always outperform complex psychometric methods (like the one that got infamous because of ProPublica).
Cliffhanger!
Hah, I just write more slowly than I'd like to. False suspense!