I’m attempting to fit a mixture of discrete values where there is a large sample size but many repeated values. For example, in this mixture of Poisson’s with 10,000 observations, there are only 24 unique values
from numpy.random import choice, poisson, seed
from numpy import array, unique
from torch import tensor
from pyro.infer.mcmc.api import MCMC
from pyro.infer.mcmc import NUTS
from pyro.infer import config_enumerate
from pyro import distributions as dist
from pyro import plate, sample, set_rng_seed, poutine
set_rng_seed(1)
seed(1)
n = 1000
alpha = array([0.4, 0.6])
lam = array([3., 10.])
z = choice(len(alpha), size=n, p=alpha)
data = poisson(lam[z], n)
values, counts = unique(data, axis=0, return_counts=True)
In Stan, it is possible to directly modify the likelihood by multiplying by the number of times that value has been seen, e.g.
model {
real contributions[K];
// priors....
for (n in 1:N) {
for (k in 1:K) {
contributions[k] = log(theta[k]) + normal_lpdf(values[n] | mu[k], sigma[k]);
}
target += log_sum_exp(contributions) * count[n];
}
}
My understanding of pyro is that you can use pyro.poutine.scale
to accomplish something similar
@config_enumerate
def model(values, counts, mu0, sigma0, pi0):
# priors
mu = sample("mu", dist.Normal(mu0, sigma0).to_event(1))
theta = sample("theta", dist.Dirichlet(pi0))
with plate('data'), poutine.scale(scale=counts):
Z = sample('Z', dist.Categorical(theta))
sample('obs', dist.Poisson(mu[Z]), obs=values)
However, I think this throws off the enumeration of Z’s and now during enumeration, all observations of a given value are assigned Z=1, etc. When I fit the model to data
without poutine.scale
, the true mu’s and theta’s are recovered well, but that does not happen with scaling. Is there any way around this issue?