Custom Bernoulli distribution or mixture Gaussian distribution implement

sejabs · May 8, 2018, 2:42am

Gal and Ghahramani’s work have showed that VI with Bernoulli distributions and dropout in deep layers have almost same effect. Usually, guys use dropout in the traditional deep learning model to evaluate post-distributions of weights e.g. recurrent-dropout-experiments.
However, I want to implement the Bayesian architecture in deep learning problems, that is, using VI with approximate post distributions like Bernoulli distribution.
But I found the Bernoulli distribution in pytorch and pyro just only distribute on 0 and 1. I want it distribute on 0 and learning real parameters. Additionally, there is no directly mixture Gaussian distribution like in Edward (Categorical(probs_tensor, distribution_1, distribution_2, …)) to easily follow Ghahramani’s work. How to implement above custom distributions in pyro? Is there some detailed tutorials or solutions?
Thanks!

jpchen · May 8, 2018, 6:04am

I want it distribute on 0 and learning real parameters.

can you clarify what you mean by this?

there is no directly mixture Gaussian distribution

see the GMM tutorial

sejabs · May 8, 2018, 6:32am

Thanks for reply!
first, the original Bernoulli distribution is about True/False i.e. 0 and 1. In their paper Gal and Ghahramani propose a approximate distribution similar to Bernoulli. It has certain probability to choose 0, otherwise to choose a real number (layer weight) which need to be learned in VI process, If my understanding is right.
they proved the dropout in traditional deep learning model is substantially the VI with the proposed approximate post distributions of layers weights and bias.

finally I think the GMM tutorial may be useful for building a mixture Gaussian posterior which means are 0 and learn-able real number. I will try it. Thank you!

fritzo · May 8, 2018, 6:23pm

It sounds like you want a zero-inflated distribution, e.g. a zero-inflated gaussian. In Pyro you could accomplish this using two distributions

mask = pyro.sample("mask", dist.Bernoulli(p))
nonzero = pyro.sample("nonzero", dist.Normal(loc, scale))
x = nonzero * mask

It would help to see some context where you are using the zero-inflated distribution.

sejabs · May 8, 2018, 10:54pm

Thanks, @fritzo! I implement the code like this:

 loc_initial = torch.randn(param.size(), dtype=torch.float32)
 loc_learn = pyro.param('guide_'+name, loc_initial)
  assignment = \
        pyro.sample(name,
            pyro.distributions.Categorical(torch.tensor([0.2, 0.8])).expand_by(param.size()))
 assignment = assignment.float()
 assignment = torch.mul(loc_learn, assignment)
 pyro.distributions.Normal(assignment, torch.tensor(0.2))

It can work. But the elbo loss is quite large, about 4million at initial stage. It can be downgrade as program runing. Is it normal that the loss is so large? My model has about 0.35 million learning parameters, is ti possible the divergence between priors and posteriors induce the large loss?