I am looking for an implementation of LDA or SVI-LDA (Hoffman, M. D., Blei, D. M., Wang, C., & Paisley, J. W. (2013). Stochastic variational inference. Journal of Machine Learning Research, 14(1), 1303-1347) in Pyro/PyTorch. Please share if you have any implementation.
Thanks Dave for sharing. Do you consider this as the state-of-the-art implementation on LDA compared to sklearn.decomposition.LatentDirichletAllocation? Can you comment on any similar methods to achieve LDA?
It depends what you mean by “state of the art”. LDA by itself isn’t really state of the art anymore as far as topic models more generally are concerned. If you like the fairly simple yet expressive nature of LDA, then sure, this is a good implementation and will scale to rather larger datasets. As far as modifications to this go: what are you trying to achieve? I need more info on your problem space before I can give more detailed advice about model structure or inference methods.
Thanks Dave for the response. I am planning to implement Supervised LDA (Blei and McAuliffe, 2010) in Pyro. I am new to Pyro so I am excited to contribute. Any guidance in this endeavor is appreciated.
Great. Luckily for you, if that’s all you want to do then figure 1 of the paper tells you the model structure and section 3 concerns inference. Actually you really don’t need pyro to do this since there are analytical variational approximations derived in the paper. But, if you want to use black box SVI a la pyro for some reason, it should not be so hard to a) use an autoguide over all continuous rvs and marginalize out all discrete ones in the guide using poutine.block or b) perform amortized inference by constructing some or all of the transformations in the guide using the NN architecture of your choice. Does this make sense?
Thanks Dave for providing the details. I am interested in understanding and possibly implement Black box SVI/AEVB for supervised topic models. Could you point me to the tutorials/sources in doing the following things you suggested: “a) use an autoguide over all continuous rvs and marginalize out all discrete ones in the guide using poutine.block or b) perform amortized inference by constructing some or all of the transformations in the guide using the NN architecture of your choice”. Please advise.