the ELBO is by construction a lower bound to the log evidence. if you have a VAE with N data points the ELBO therefore scales like N. that’s the quantity that pyro reports. it does not by default normalize ELBOs per datapoint.
you can use poutine.scale
to scale the ELBO and therefore its gradients. see here