Pyro Prod-LDA prediction for train and test documents

Hi, I have implemented the Pyro ProdLDA implementation (Probabilistic Topic Modeling — Pyro Tutorials 1.8.1 documentation). I am trying to extract the thetas (doc-topic distributions) for the training data. Additionally, I would want to predict doc-topic distributions for test data as well, but I do not know how to do this. Can someone help? :smile:
Thanks in advance!

If you return thetas from the guide method, then executing the guide method will give you thetas. :slight_smile:

Hi, thank you for the quick response! Just running prodLDA.guide(docs), does indeed return the topic distributions! However I do not know, how to do it for test documents. Just changing the docs to test_docs gives an error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1860x53531 and 78182x100)
Thank you very much again :blush:

I guess you can look at the error trace to see where the shapes are mismatched. You’ll know what are those numbers 1860, 53531, 78182, 100 by printing the shapes around the error.

I believe that the problem is that the test document is not the same shape as the training docs.
The shape of the training documents: [3212, 78182]
The shape of the test documents: torch.[1860, 53531]
So the inner values of matrices needs to be compatible. Or in other word the shape needs to be transformed.

I guess it is data problem. You need to use the same vocabulary for train docs and test docs.

I figured it out! For test documents the whole model method needs to be ran again. So by changing the the model method to return the theta and running prodLDA.model(test_docs), you get the topic distributions for unseen documents!

Thank you very much for your help!

Hmm, the model only specifies the priors for logtheta. I believe that you need to run the guide.