Discussion about this post

User's avatar
Matt Hoffman's avatar

"For most parameter regimes of interest, the Wishart distribution, the Dirichlet distribution, the gamma distribution, the chi-square distribution, the beta distribution, and the Weibull distribution are also log-concave."

I disagree! These distributions (which are mostly just gamma distributions after one change of variables or another) are most interesting when their shape parameters are less than one. The range of parameters where these distributions are both log-concave and "interesting" (in the sense of being hard to approximate well with a Gaussian) is actually pretty small IMO.

That said, for each of these distributions, there exists a change of variables that makes them both log-concave and unconstrained, e.g.:

• Gamma: X ~ Gamma(α, β), Y = log(X), p(y) is log-concave

• Beta: X ~ Beta(α, β), Y = logit(X), p(y) is log-concave

• Dirichlet: Basically same as beta, but inverting the multinomial logistic function is annoying so I won't write it here

• Chi-Squared: Special case of gamma

• Weibull: Just a change of variables on the Gumbel, which is log-concave (and also just a change of variables on the Exponential, a special case of Gamma)

So I would argue that the lack of log-concavity in these distributions arises from looking at them the wrong way. In fact, I'd go further and argue that we don't really know how to construct useful distributions _except_ by changes of variables and compounds (e.g., student-t is just a scale-mixture of normals) applied to a handful of simple, log-concave distributions.

Expand full comment
Christopher Harshaw's avatar

Hey Ben -- long time subscriber, second time commenter here. Having recently been granted the label of "statistician", I can confirm that statisticians *love* maximum likelihood. But what do you do if the log-likelihood is non-convex? After all, in this case it may not be computationally feasible to obtain your MLE estimate!

I recently heard Richard Samworth give a talk which I really liked. They proposed the following idea in the context of linear regression: you first use your data to estimate the best convex approximation to the log-likelihood, and then you optimize that best convex approximation to get a point-estimate. Pretty cool - figured you might like it too: https://arxiv.org/abs/2403.16688

It feels in line with the spirit of your post (or perhaps the next one) which is: when the objective isn't convex, CONVEXIFY

Expand full comment
10 more comments...

No posts