# Mixture of poisson's with large sample size

I’m attempting to fit a mixture of discrete values where there is a large sample size but many repeated values. For example, in this mixture of Poisson’s with 10,000 observations, there are only 24 unique values

``````from numpy.random import choice, poisson, seed
from numpy import array, unique
from torch import tensor

from pyro.infer.mcmc.api import MCMC
from pyro.infer.mcmc import NUTS
from pyro.infer import config_enumerate
from pyro import distributions as dist
from pyro import plate, sample, set_rng_seed, poutine

set_rng_seed(1)
seed(1)

n = 1000
alpha = array([0.4, 0.6])
lam = array([3., 10.])

z = choice(len(alpha), size=n, p=alpha)
data = poisson(lam[z], n)
values, counts = unique(data, axis=0, return_counts=True)
``````

In Stan, it is possible to directly modify the likelihood by multiplying by the number of times that value has been seen, e.g.

``````model {
real contributions[K];

// priors....

for (n in 1:N) {
for (k in 1:K) {
contributions[k] = log(theta[k]) + normal_lpdf(values[n] | mu[k], sigma[k]);
}

target += log_sum_exp(contributions) * count[n];
}
}
``````

My understanding of pyro is that you can use `pyro.poutine.scale` to accomplish something similar

``````@config_enumerate
def model(values, counts, mu0, sigma0, pi0):
# priors
mu = sample("mu", dist.Normal(mu0, sigma0).to_event(1))
theta = sample("theta", dist.Dirichlet(pi0))

with plate('data'), poutine.scale(scale=counts):
Z = sample('Z', dist.Categorical(theta))
sample('obs', dist.Poisson(mu[Z]), obs=values)
``````

However, I think this throws off the enumeration of Z’s and now during enumeration, all observations of a given value are assigned Z=1, etc. When I fit the model to `data` without `poutine.scale`, the true mu’s and theta’s are recovered well, but that does not happen with scaling. Is there any way around this issue?

I think that the model I want will require a custom implementation of a Mixture of Poisson’s distribution. I started with a mixture of normals to test:

``````from numpy.random import choice, normal, seed
from numpy import array
from torch import tensor, Size

from pyro.infer.mcmc.api import MCMC
from pyro.infer.mcmc import NUTS
from pyro import distributions as dist
from pyro import plate, sample

class MixtureNormal(dist.TorchDistribution):
support = dist.Normal.support
has_rsample = False

def __init__(self, mu, sigma, theta):
self._normal = dist.Normal(mu, sigma)
self.theta = theta
super(dist.TorchDistribution, self).__init__()

def sample(self, sample_shape=Size()):
return tensor(1.)

def log_prob(self, value):
contributions = self._normal.log_prob(value) + self.theta.log()
return contributions.logsumexp(0)

def model(data, mu0, sigma0, pi0, alpha0, beta0):
# priors
mu = sample("mu", dist.Normal(mu0, sigma0).to_event(1))
theta = sample("theta", dist.Dirichlet(pi0))
sigma = sample("sigma", dist.Gamma(alpha0, beta0))

# with plate('data', len(data)):
# sample('obs', MixtureNormal(mu, sigma, theta), obs=data)
for i in range(len(data)):
sample('obs_{}'.format(i), MixtureNormal(mu, sigma, theta), obs=data[i])

if __name__ == "__main__":
n = 100
alpha = array([0.4, 0.6])
mu = array([1.8, 5.7])
sigma = 0.1

z = choice(len(alpha), size=n, p=alpha)
data = normal(mu[z], sigma, n)

kernel = NUTS(model)
mcmc = MCMC(kernel, num_samples=100, warmup_steps=100, num_chains=1)
mcmc.run(tensor(data).float(),
tensor(mu).float(),
tensor(0.5),
tensor(alpha * 10).float(),
tensor(1.), tensor(1.))

mcmc.summary()
``````

The above works, but if I remove the `for` loop in `model` and uncomment the `plate`, I get an error saying `RuntimeError: The size of tensor a (100) must match the size of tensor b (2) at non-singleton dimension 0`. I think I need to implement an `expand` method for this class, but am not clear on how to. Any suggestions?

yes `plate` uses `expand` under the hood. you might look at other pyro distributions for example `expand` methods, e.g. studentT.

1 Like

Probably the issue at this line. `logsumexp(0)` does not work with a batch of values. Probably you want something like `logsumexp(-1)`. Another issue is your sample method returns a scalar, which is not True in general. I believe you need to declare `batch_shape`, `event_shape` in the constructor and `return torch.ones(sample_shape + self.shape())` in the `sample` method.

I think you don’t need to implement `expand` method because `.expand(...)` will return an ExpandedDistribution for you (unless you want to overwrite it with some custom implementations).

Thanks @fehiepsi.

I’m having some issues getting two different priors set on the component means.

What I would like to represent with my priors is:

mu[1] ~ Normal(1.8, 0.1)
mu[2] ~ Normal(5.7, 0.1)

But when I run multiple chains, some chains end up with posteriors centered at 1.8 in component 1 mean and some centered at 5.7 component 1 mean. Any suggestions on how to fix this? Do I need some type of `plate` around the `mu`s?

For what it’s worth, it does not seem like the chains are initialized by sampling from the prior, as described in the documentation

initial_params … If not specified, parameter values will be sampled from the prior.

``````from numpy.random import choice, normal, seed
from numpy import array
from torch import tensor, Size, ones

from pyro.infer.mcmc.api import MCMC
from pyro.infer.mcmc import NUTS
from pyro import distributions as dist
from pyro import plate, sample, set_rng_seed

class MixtureNormal(dist.TorchDistribution):
support = dist.Normal.support
has_rsample = False

def __init__(self, mu, sigma, theta):
self._normal = dist.Normal(mu, sigma)
self.theta = theta
self._num_component = theta.shape[-1]

cdbs = self._normal.batch_shape[:-1]
event_shape = self._normal.event_shape
self._event_ndims = len(event_shape)

super(MixtureNormal, self).__init__(batch_shape=cdbs,
event_shape=event_shape,
validate_args=False)

def sample(self, sample_shape=Size()):
return ones(sample_shape + self.shape())

def log_prob(self, value):
contributions = self._normal.log_prob(value) + self.theta.log()

return contributions.logsumexp(-1)

def model(data, mu0, sigma0, pi0):
# priors
mu = sample("mu", dist.Normal(mu0, sigma0))
theta = sample("theta", dist.Dirichlet(pi0))
sigma = tensor(0.1)

with plate('data', len(data)):
sample('obs', MixtureNormal(mu, sigma, theta), obs=data)

seed(1)
set_rng_seed(1)

n = 100
alpha = array([0.4, 0.6])
mu = array([1.8, 5.7])
sigma = 0.1

z = choice(len(alpha), size=n, p=alpha)
data = normal(mu[z], sigma, n)

mu0 = tensor(mu).float()
sigma0 = tensor(0.1)
alpha = tensor(alpha * 100).float()

kernel = NUTS(model)
mcmc = MCMC(kernel, num_samples=10, warmup_steps=0, num_chains=4)
mcmc.run(tensor(data).float().unsqueeze(1),
mu0,
sigma0,
alpha)

print(mcmc.summary())

print(mcmc.get_samples(group_by_chain=True)['mu'].mean(1))
print(mcmc.get_samples(group_by_chain=True)['mu'])
``````
``````                mean       std    median      5.0%     95.0%     n_eff     r_hat
mu[0]      2.55      1.99      1.81      1.09      6.36      2.39      3.12
mu[1]      4.84      2.02      5.70      0.17      5.77      2.40      2.82
theta[0]      0.56      0.18      0.47      0.30      0.76      2.12     23.79
theta[1]      0.44      0.18      0.31      0.24      0.70      2.12     23.79

Number of divergences: 3
None
tensor([[1.4932, 5.8903],
[1.5514, 5.8562],
[1.4103, 5.9970],
[5.7619, 1.6005]])
tensor([[[-0.6392,  7.8258],
[ 1.2329,  5.3973],
[ 1.6752,  5.7484],
[ 1.7760,  5.7006],
[ 1.8168,  5.7074],
[ 1.8129,  5.7082],
[ 1.8145,  5.7005],
[ 1.8147,  5.7048],
[ 1.8147,  5.7048],
[ 1.8132,  5.7057]],

[[-0.1903,  7.4315],
[ 1.3376,  5.4560],
[ 1.7005,  5.7405],
[ 1.7884,  5.7023],
[ 1.8100,  5.7074],
[ 1.8130,  5.7053],
[ 1.8133,  5.7071],
[ 1.8163,  5.7027],
[ 1.8132,  5.7023],
[ 1.8122,  5.7066]],

[[-1.2729,  9.0464],
[ 1.0872,  5.2244],
[ 1.6405,  5.7740],
[ 1.7750,  5.6915],
[ 1.8069,  5.7103],
[ 1.8117,  5.7038],
[ 1.8117,  5.7038],
[ 1.8152,  5.7058],
[ 1.8146,  5.7044],
[ 1.8133,  5.7056]],

[[ 6.3648,  0.1720],
[ 5.6095,  1.4249],
[ 5.7201,  1.7247],
[ 5.7014,  1.7931],
[ 5.7057,  1.8105],
[ 5.7018,  1.8185],
[ 5.7048,  1.8165],
[ 5.7031,  1.8123],
[ 5.7049,  1.8164],
[ 5.7031,  1.8156]]])
``````

@jacobcvt12 I am not sure if using multi-chains will be helpful here. AFAIK multi-chains are mainly used to get more samples and to diagnose mixing issues. For multi-modal posteriors, if each chain covers one mode, it is still unclear on how to merge samples from those chains (for examples, assume your data concentrates on 2 groups, but one group contains 90 points, the other group contains 10 points; you run 2-chain MCMC and somehow get 1000 samples from the first chain concentrates around the first group and 1000 samples from the second chain concentrate around the second group; if you combine them, you will get equal distribution over those two groups because each of them has 1000 samples - this violates the assumption that the distribution of them should be proportional to 9-1). So to solve multi-modal posteriors, in my opinion, it is better to construct the model (e.g. using mixture models) so that it can capture those modes or using some specific MCMC algorithms (to my knowledge, using normalization flows are helpful here).

If you use pyro dev version, you can use various initialization strategies for the argument `init_strategy` of NUTS kernel. `init_to_sample` will be what you are looking for.

Thanks for suggestions @fehiepsi. I’m still confused on why the chains converge differently though. If I had an exchangeable prior on the mu’s (or even week non-exchangeable priors), I would expect that some chains might have mu[1] approx 1.8, mu[2] approx 5.7 at the posterior and other chains vice versa. But given the strong priors, I’m surprised that this is happening. My guess is that the way I am specifying the prior does not imply ordering, whereas in Stan, you can specify e.g.

``````for (k in 1:K) {
mu[k] ~ Normal(mu0[k], sigma0);
}
``````

Is there a way to do something similar in pyro?

@jacobcvt12 You are right that with the default `init_strategy`, chains might start at points far from priors. If you are familiar with Stan, then you might know that initial values in Stan are generated randomly from -2 to 2 in “unconstrained support” (see init section in Stan doc). The default `init_strategy` in Pyro is `init_to_uniform(radius=2)`, which is the same as Stan. That means, if your prior is Normal(100, 1), the initial value still belongs to (-2, 2) with the default init strategy in Pyro and Stan. In Pyro, you can use `init_strategy=init_to_sample` to get initial values from priors. I’m not sure if Stan supports such functionality.

the chains converge differently

I guess with `num_samples=10, warmup_steps=0`, the particles haven’t traveled far from initial values yet. Could you try higher `num_samples` or `warmup_steps`?

Thanks @fehiepsi. I’m not able to find documentation with examples showing how to use the different init strategies for MCMC in pyro. Any suggestions on where to look?

@jacobcvt12 I think you can find some examples here together with documentation for `init_strategy`.

1 Like