I also tried to implement the model with different sizes in Pyro following the tutorial and it produced really poor results in addition to its slowness [even some bugs in further steps: Latent Dirichlet Allocation Model and Possible bugs: Predictive vs Subsampling, Enumerate Error] . I assume it is a problem with Pyro not with the model because I also implemented LDA in python from scratch (using Collapsed Gibbs Sampling) and it works reasonably good on the same dataset (bunch of abstracts from articles). Is your problem only with performance or also results?