Pyro Prod-LDA prediction for train and test documents

nastasija · June 24, 2022, 8:05am

Hi, I have implemented the Pyro ProdLDA implementation (Probabilistic Topic Modeling — Pyro Tutorials 1.8.4 documentation). I am trying to extract the thetas (doc-topic distributions) for the training data. Additionally, I would want to predict doc-topic distributions for test data as well, but I do not know how to do this. Can someone help?
Thanks in advance!

fehiepsi · June 24, 2022, 10:56am

If you return thetas from the guide method, then executing the guide method will give you thetas.

nastasija · June 24, 2022, 12:27pm

Hi, thank you for the quick response! Just running prodLDA.guide(docs), does indeed return the topic distributions! However I do not know, how to do it for test documents. Just changing the docs to test_docs gives an error:
RuntimeError: mat1 and mat2 shapes cannot be multiplied (1860x53531 and 78182x100)
Thank you very much again

fehiepsi · June 24, 2022, 2:46pm

I guess you can look at the error trace to see where the shapes are mismatched. You’ll know what are those numbers 1860, 53531, 78182, 100 by printing the shapes around the error.

nastasija · June 24, 2022, 3:04pm

I believe that the problem is that the test document is not the same shape as the training docs.
The shape of the training documents: [3212, 78182]
The shape of the test documents: torch.[1860, 53531]
So the inner values of matrices needs to be compatible. Or in other word the shape needs to be transformed.

fehiepsi · June 24, 2022, 4:08pm

I guess it is data problem. You need to use the same vocabulary for train docs and test docs.

nastasija · June 24, 2022, 5:26pm

I figured it out! For test documents the whole model method needs to be ran again. So by changing the the model method to return the theta and running prodLDA.model(test_docs), you get the topic distributions for unseen documents!

Thank you very much for your help!

fehiepsi · June 24, 2022, 5:38pm

Hmm, the model only specifies the priors for logtheta. I believe that you need to run the guide.