I’m still in the early stages of understanding Bayesian inference, so forgive me if I miss something obvious.
A tutorial I was following on Monte Carlo methods (in R) discussed the use of the D.I.C. to evaluate model effectiveness.
I didn’t see a DIC metric in pyro, so I was hoping someone would be willing to check my work?
My implementation for the DIC function is this:
(variable names correspond to this documentation DIC function - RDocumentation)
def dic(model_fn, samples, **kwargs):
pi_theta = numpyro.infer.log_likelihood(model_fn, samples, **kwargs)['y'].sum(axis=1)
D_bar = -2 * pi_theta.mean(axis=0)
theta_star = {k: v.mean(keepdims=True) for k, v in samples.items()}
pi_theta_star = numpyro.infer.log_likelihood(model_fn, theta_star, **kwargs)['y'].sum()
D_theta_star = -2 * pi_theta_star
print('expected deviance:', D_bar)
print('effective number of parameters:', D_bar - D_theta_star)
print('DIC:', D_bar - D_theta_star + D_bar)
A Jupyter notebook with an example can be found here: bayesian-notes/DIC.ipynb at master · bdhammel/bayesian-notes · GitHub
In the above link, I fit two simple models against some noisy data generated from the function y=mx: A model with one coefficient and another model with two coefficients.
The penalty terms I get back are close to 1 and 2, respectively. My understanding is that these should actually be closer to 2 and 3, because the standard deviation should be included in the effective number of parameters.
I’m not sure if this is expected, because it’s the approximate number of parameters, or if there is a mistake somewhere in my work.
Specifically, I’m not entirely confident in my calculation of D_bar
- I’m not sure if the expectation value works out to be a straight mean.