Deviance information criterion (DIC) for mcmc model

bdhammel · November 7, 2019, 2:38pm

I’m still in the early stages of understanding Bayesian inference, so forgive me if I miss something obvious.

A tutorial I was following on Monte Carlo methods (in R) discussed the use of the D.I.C. to evaluate model effectiveness.

I didn’t see a DIC metric in pyro, so I was hoping someone would be willing to check my work?

My implementation for the DIC function is this:

(variable names correspond to this documentation DIC function - RDocumentation)

    def dic(model_fn, samples, **kwargs):
        pi_theta = numpyro.infer.log_likelihood(model_fn, samples, **kwargs)['y'].sum(axis=1)
        D_bar = -2 * pi_theta.mean(axis=0)
        theta_star = {k: v.mean(keepdims=True) for k, v in samples.items()}
        pi_theta_star = numpyro.infer.log_likelihood(model_fn, theta_star, **kwargs)['y'].sum()
        D_theta_star = -2 * pi_theta_star
        print('expected deviance:', D_bar)
        print('effective number of parameters:', D_bar - D_theta_star)
        print('DIC:', D_bar - D_theta_star + D_bar)

A Jupyter notebook with an example can be found here: bayesian-notes/DIC.ipynb at master · bdhammel/bayesian-notes · GitHub

In the above link, I fit two simple models against some noisy data generated from the function y=mx: A model with one coefficient and another model with two coefficients.

The penalty terms I get back are close to 1 and 2, respectively. My understanding is that these should actually be closer to 2 and 3, because the standard deviation should be included in the effective number of parameters.

I’m not sure if this is expected, because it’s the approximate number of parameters, or if there is a mistake somewhere in my work.

Specifically, I’m not entirely confident in my calculation of D_bar - I’m not sure if the expectation value works out to be a straight mean.

fehiepsi · November 8, 2019, 7:54pm

Hi @bdhammel, I think that your implementation is correct. Probably the prior of your std is too wide, so std is not an efficient parameter (heuristically). Could you try to either reduce rate parameter of InverseGamma prior or use dist.Exponential(1) prior? Then it is a good exercise to compare DICs (and other information criterions) of your models with different std priors.

bdhammel · November 9, 2019, 6:54pm

Thank you very much for looking into this.

I was intentionally trying to set the priors to be as non-informative as possible because I thought I would un-bias the model that way. But, switching to an std-prior of dist.Exponential(1) returns the expected values of 2 and 3, so it looks like this works.

For my own education, could you point me to a resource about why an exponential is an o.k. conjugate prior to use for the std? Naively looking at this: Conjugate prior - Wikipedia I thought I had to use an inverse gamma distribution if (I believe) my obs follow a normal distribution.

fehiepsi · November 10, 2019, 4:00pm

Hi @bdhammel, I don’t have much experience with conjugation. I guess it does not important unless you can exploit that information in the sampler. Looking at your data, I just think that the wide prior for std is not a good option. You can use other priors I guess as long as it is not too wide.