Hey guys,
I have a few questions about this tutorial:

The code for the tutorial uses sequential (variational) inference over the discrete latent variable for number of images  however, it seems to me like exact inference over the model side (as opposed to guide side) discrete latent variable for number of images could easily be done. Am I incorrect in stating this? I saw another post that seemed to think that exact (parallel) inference over the latent variable made no sense, but I think I disagree. (In case youâ€™re curious as to how this would work, you would create an output variable in the LSTM in the guide for the probability that there is another image, and you would need to create a weighted average of the logprobs over the guide according to the probability of the number of digits the image has. Essentially, the discrete latent variable is stochastic in the model side and fully integrated out, and deterministic in the guide side.).

Why are we using this prior over the z_where variable?
z_where_prior_loc = torch.tensor([3., 0., 0.])