Hi everyone, I’m new to probabilistic programming.

I recently studied with great interest the tutorial on SS-VAE, but I have a doubt about the way the guide has been defined. In particular, I did not fully understand why the “encoder_z” receives, beside the input x, a “guessed y” even when y should not be provided (i.e., in the case of an unsupervised batch). For the sake of clarity, this is the code I’m referring to:

```
def guide(self, xs, ys=None):
with pyro.plate("data"):
# if the class label (the digit) is not supervised, sample
# (and score) the digit with the variational distribution
# q(y|x) = categorical(alpha(x))
if ys is None:
alpha = self.encoder_y(xs)
ys = pyro.sample("y", dist.OneHotCategorical(alpha))
# sample (and score) the latent handwriting-style with the variational
# distribution q(z|x,y) = normal(loc(x,y),scale(x,y))
loc, scale = self.encoder_z([xs, ys])
pyro.sample("z", dist.Normal(loc, scale).to_event(1))
```

Is this required for some theoretical reason that I have not fully understood? I’m a little bit confused about this point, since the original approach in Kingma et al. (2014) factorizes the full posterior q(z, y | x) as a product of a posterior on z (q(z | x)) and one on y (q(y | x)) —i.e., it assumes that the two latent variables are independent.

Otherwise, if this is not required formally, does this facilitate the implementation of the approach in some way instead? Indeed, in this way, “encoder_z” always receives two components to concatenate (i.e., x and the observed or guessed y), instead of having the problem to manage a case when it receives only x or both x and y? Is this the true reason behind this choice?

Thank you so much.