arg min

100%. This is a much better articulation of what I was trying to say.

Expand full comment

Matt Hoffman

Thanks! This topic is a bit of a hobby horse for me :)

Expand full comment

Sep 12Edited

What's your intuitive explanation for why log-concave distributions are easy to sample from?

Expand full comment

Matt Hoffman

Well, they're easy to upper-bound with piecewise-log-linear functions, which means rejection sampling is always a decent option.

And if you've got a good gamma sampler you can turn those samples into most of the things you probably want:

• exponential (set shape=1)

• chi-squared (shape=dof/2)

• normal (via Box-Muller on an exponential, or by square-rooting a single chi-squared and randomizing the sign)

• multivariate normal (just a linear transformation of K independent normals)

• inverse-gamma (just take the reciprocal)

• student-t (sample an appropriate inverse-gamma, square-root and multiply by a normal)

• Dirichlet (sample K gammas and normalize)

• Beta (special case of Dirichlet)

• negative binomial (gamma mixture of Poissons; you do need a separate Poisson sampler though)

I can't think of many distributions you can't sample from by transforming some number of gamma samples—the only exceptions that come to mind are the generalized inverse-Gaussian (a nontrivial generalization of the gamma, inverse-gamma, and inverse-Gaussian) and the von Mises-Fisher.

But I actually wouldn't say that the gamma distribution is particularly easy to sample from, unless you've got access to a good software implementation! There are no simple tricks like Box-Muller, good gamma samplers rely on a cascade of very clever tricks involving obscure properties of beta distributions and such. Thankfully at this point in human history all that complexity is abstracted away behind simple APIs with high-quality implementations.

By contrast, there have been times in my life when I've badly wanted to sample from a generalized inverse-Gaussian (which is log-concave after a logarithmic change of variables), and found it extremely annoying due to the lack of a convenient, performant sampler implementation.

Expand full comment

Andrija

Sep 17

I always thought that sampling from any given (continuous) distribution is done "straightforwardly" by simply sampling from uniform and then mapping to the distribution space via the inverse-CDF trick. I thought this approach doesn't care about the log-concavity property, and has issues only if there are regions of the distribution domain which have exactly zero probability (since then inverse-CDF is not uniquely defined).

As far as I can see the generalized inverse-Gaussian seems to have a well behaved CDF, so why wouldn't the above approach simply work for it too?

Expand full comment

Matt Hoffman

Sep 17

This approach seems to be begging the question a bit—if you have a good inverse-CDF implementation, then yes, you have a good sampler.

Trouble is, CDFs aren't always easy to compute (https://en.wikipedia.org/wiki/Generalized_inverse_Gaussian_distribution doesn't offer any closed form, and https://warrenweckesser.github.io/mpsci/distributions/geninvgauss.html resorts to quadrature on the PDF). And inverting them using (say) Newton's method makes the proposition significantly more expensive and less numerically stable.

"Well behaved" sadly doesn't imply "easy to compute".

Expand full comment

Joao

Jun 12

Hey Ben. I didn't understood why the function f(x,y)=x*y is quasiconvex. For example if L=-1 then the sublevel set {(x,y): x*y<=-1} is the union of two (disjoint) convex sets.

Expand full comment

Jun 12

Thanks Joao. You are correct. There were two typos in that sentence: first it's the function -x*y, second it is only quasiconvex on the nonnegative orthant. I have updated the post to fix this error.

Expand full comment

Christopher Harshaw

Hey Ben -- long time subscriber, second time commenter here. Having recently been granted the label of "statistician", I can confirm that statisticians *love* maximum likelihood. But what do you do if the log-likelihood is non-convex? After all, in this case it may not be computationally feasible to obtain your MLE estimate!

I recently heard Richard Samworth give a talk which I really liked. They proposed the following idea in the context of linear regression: you first use your data to estimate the best convex approximation to the log-likelihood, and then you optimize that best convex approximation to get a point-estimate. Pretty cool - figured you might like it too: https://arxiv.org/abs/2403.16688

It feels in line with the spirit of your post (or perhaps the next one) which is: when the objective isn't convex, CONVEXIFY

Expand full comment

At some point in the course, we'll get to problems that refuse to be convexified. But overall, I agree.

I'd also add that maximum likelihood is a bizarre fever dream of Fisher that rests on the shakiest of conceptual, philosophical, and mathematical grounds and is never necessary for any data analysis.

Expand full comment

robin

Sep 14

I'm interested to read more about this. I'm working on a project involving non-convex, multimodal (restricted) maximum likelihood with variance components and it's a struggle.

Expand full comment

Reply (2)