5 Comments
User's avatar
Victor Miller's avatar

What is the role of EDA (Exploratory Data Analysis) in all this? At a minimum it would seem that this should tell practitioners something about their model assumptions. I'm always a little suspicious of blithely assuming normality for most of these distributions.

Expand full comment
Ben Recht's avatar

I often wonder if we'd be better off if we gave EDA more academic credibility and didn't disparage it as HARKing or "data mining." I'll try to expand on that thought in a future post.

Expand full comment
Victor Miller's avatar

A lot of models are pulled “out of the air” (or possibly somewhere else). Model criticism has to play a role here.

Expand full comment
Aman Desai's avatar

Super interesting article, Professor! What would be the best method for approximating the “crud” distribution? I would intuitively guess bootstrapping your data, but I’m not sure if this runs into the same problems you mentioned previously.

Expand full comment
Ben Recht's avatar

Great question. I'm thinking about it myself.

Expand full comment