Binarization of dataset for VAE

astenuz · May 10, 2018, 10:11pm

Hi everyone,

Looking at the tutorial on VAEs (Variational Autoencoders — Pyro Tutorials 1.8.4 documentation) it appears to me strange why the data is not binarized, even though the calculation of the likelihood of the image is supposed to be with respect to generation by a bernoulli distribution.

# define the model p(x|z)p(z)
def model(self, x):
    # register PyTorch module `decoder` with Pyro
    pyro.module("decoder", self.decoder)
    with pyro.iarange("data", x.size(0)):
        # setup hyperparameters for prior p(z)
        z_loc = x.new_zeros(torch.Size((x.size(0), self.z_dim)))
        z_scale = x.new_ones(torch.Size((x.size(0), self.z_dim)))
        # sample from prior (value will be sampled by guide when computing the ELBO)
        z = pyro.sample("latent", dist.Normal(z_loc, z_scale).independent(1))
        # decode the latent code z
        loc_img = self.decoder.forward(z)
        # score against actual images
        pyro.sample("obs", dist.Bernoulli(loc_img).independent(1), obs=x.reshape(-1, 784))

How is the likelihood calculated when the observation is not 0 or 1?, what is it actually calculating?.

I also noticed that results are better at digit generation when binarization is not done, even though the elbo is better when doing binarization.

neerajprad · May 10, 2018, 11:30pm

You could take a look at #529 for some context on this. It works out because of the way Bernoulli.log_prob is implemented which computes binary_cross_entropy, and hence can be passed a continuous value. A more elegant approach would be to use a distribution valued observation as discussed in #988.

astenuz · May 11, 2018, 1:58am

I see, due to the bce under the bernoulli it works under the hood as a regular implementation in pytorch.

About the distribution valued observation, how would that work?

jpchen · May 11, 2018, 5:46am

About the distribution valued observation, how would that work?

see the discussion in the issue neeraj linked above, it’s a wip idea. feel free to contribute to the discussion if you have ideas!

fritzo · May 11, 2018, 3:39pm

You could alternatively explicitly marginalize out the binarization process, which I believe is eqivalent to the continuous-observation trick in the tutorial:

with pyro.iarange("pixels", 784):
    binarized = pyro.sample("binarized",
                            dist.Bernoulli(x.reshape(-1, 784),
                            infer={'enumerate': 'parallel'})
    pyro.sample("obs", dist.Bernoulli(loc_img), obs=binarized)

...
svi = SVI(model, guide, optim, loss=TraceEnum_ELBO(max_iarange_nesting=2))

astenuz · May 11, 2018, 3:51pm

Interesting, so internally pyro traces the binarization, which would be different from having binary input data i guess.

Why do we need the max_iarange_nesting=2 inside the loss function definition?