'lp__' in Stan output

For those who aren’t familiar with Stan, it is a great tool for statistical modeling from Andrew Gelman’s group at Columbia. Stan supports both bayesian inference through MCMC sampling and maximal likelihood estimation through BFGS optimization (http://mc-stan.org/). 

It comes with a very active user group, too. 

I was puzzled by the extra term “lp__” in Stan’s output. It was described as “log density up to a constant” in the manual. Here, log density should be log likelihood of the observations conditioned on the posterior parameters: p(y | p_post). In the MCMC sampling paradigm, this could be obtained by plugging the parameter samples in each iteration into the likelihood and compute the probability of each observation. Therefore, “lp” is actually a vector in length equals to the number of observations. “lp__” in Stan’s output would be the sum of the “lp” vector, which essentially quantifies how well the model match the data, hereby useful for model comparison purposes.

To test my thought, I computed “lp__” as described above, which turned out to be very different from what Stan actually reported. What went wrong? Looking into the manual description: log density up to a constant. What does “up to a constant” mean?

After N cups of coffee, turned out Stan implicitly took off any terms in the likelihood that are independent of any model parameters. This is also called scale factor, which turns to a constant in the log scale. Taking off the scale factor does not affect the model parameters estimation given we are sampling through MCMC. However, we want to keep the likelihood function complete to get the real “lp__”, because we are computing this value from the likelihood. Also, adding N times the constant to the “lp__” by Stan should return the true “lp__”, where N equals to the number of observations. It is also immediately clear that “lp__” by Stan is not useful for model comparisons, because the constant from different models are likely not be the same.

This attached PDF illustrates how “lp__” term was computed in Stan, how it relates to the true “lp__”, and how to retrieve the real “lp__” directly within Stan, using a normal model.


Follow him on Twitter @xulong82.