Confusion about the LDA example phrasing "histogram of word counts"

samghelms · February 24, 2019, 11:35pm

What tutorial are you running: LDA

The LDA tutorial states that “Whereas the model in this example treats documents as vectors of categorical variables (vectors of word ids), it is usually more efficient to treat documents as bags of words (histograms of word counts).”

Yet, so far as I understand it, LDA treats documents as vectors of categorical variables, not histograms of word counts. Is the author of this example referring to a topic model with gaussian assumptions, like latent semantic indexing, when they talk about histograms of word counts? The model in the example seems to match the probabilistic graphical model for LDA perfectly well – I don’t see how a “histogram of word counts” could match the LDA model better.

Furthermore, the guide function treats the word count vectors as a histogram – doesn’t this mean that pyro is already making use of the histogram of word counts idea to approximate the LDA?