I have been recently trying to understand CVAE by reading the Pyro example: Conditional Variational Auto-encoder — Pyro Tutorials 1.7.0 documentation
I found the generation network seems to only take z as input instead of both z and x:
# the output y is generated from the distribution pθ(y|x, z) loc = self.generation_net(zs)
Does this mean we are having p(y|z) instead of p(y|x, z) for the decoding? If so, what could be the difference if we feed both x and z to the decoder?