LDA: documents with different sizes

@martinjankowiak, thanks again… answering your questions:

  1. Using softplus and even sigmoid activation worked to enforce positivity!
  2. ClippedAdam makes the topics collapse: it generates the same top words for all topics
  3. Don’t know how to use poutine.scale, and actually it wasn’t needed…

Now, to the most important part: I fixed the loss formula so both approaches would compute it the same way. However, ProdLDA port to Pyro loss is still 10x higher than pure PyTorch implementation. I tried lots of things, but could not make port to Pyro version to improve.

Eyeballing the top words on each topic, although the topics generated from port to Pyro version make some sense, they are still worse than the pure PyTorch implementation. Don’t know why this is happening.

But most important: the topics generated by this implementation of 2017’s ProdLDA are not as good as the ones generated by the basic LDA with Mean Field approximation from Blei/Ng/Jordan’s seminal paper from 2003! So, following the thought discussed above with @eb8680_2 (i.e. Pyro must have a nice LDA tutorial to support the universal in “deep universal probabilistic programming”), I will change the implementation to the mean field variational inference (the seminal paper from 2003), and hopefully it will work. It will be a more introductory tutorial, but will fulfill its purpose…

Cheers

2 Likes