Questions about LDA example code

whatever60 · June 6, 2023, 6:57pm

Hi Pyro developers,

Thank you for maintaining this amazing project and the thorough documentation.

However, I do have two questions about the example code of amortized Latent Dirichlet Allocation.

pyro-ppl/pyro/blob/ae124d51a9e88068b177656f430e33a41c7a39d4/examples/lda.py#LL96C1-L102C6


      
          def parametrized_guide(predictor, data, args, batch_size=None):
              # Use a conjugate guide for global variables.
              topic_weights_posterior = pyro.param(
                  "topic_weights_posterior",
                  lambda: torch.ones(args.num_topics),
                  constraint=constraints.positive,
              )

The comment at line 97 (repo tag 1.8.5) says it is using conjugate guide, but I think it is not conjugate since the posterior and prior are the same (in this case both Gamma or both Dirichlet, but neither Gamma or Dirichlet are conjugate of themselves).

github.com

pyro-ppl/pyro/blob/ae124d51a9e88068b177656f430e33a41c7a39d4/examples/lda.py#LL78C1-L94C1


      
          def make_predictor(args):
              layer_sizes = (
                  [args.num_words]
                  + [int(s) for s in args.layer_sizes.split("-")]
                  + [args.num_topics]
              )
              logging.info("Creating MLP with sizes {}".format(layer_sizes))
              layers = []
              for in_size, out_size in zip(layer_sizes, layer_sizes[1:]):
                  layer = nn.Linear(in_size, out_size)
                  layer.weight.data.normal_(0, 0.001)
                  layer.bias.data.normal_(0, 0.001)
                  layers.append(layer)
                  layers.append(nn.Sigmoid())
              layers.append(nn.Softmax(dim=-1))
              return nn.Sequential(*layers)

From the way the neural network is constructed, it seems to take a histogram as input without normalizing it. The code certainly works but it is just not common practice for a neural net to take arbitrarily large integers as input. So is there rational around this? Is it just to keep the same as the reference?

What tutorial are you running?
Amortized Latent Dirichlet Allocation
What version of Pyro are you using?
1.8.5
Please link or paste relevant code, and steps to reproduce.
Linked above.

Thank you,
Whatever60

martinjankowiak · June 7, 2023, 7:32pm

the dirichlet is conjugate to the multinomial likelihood. i guess topic_weights isn’t quite conjugate because it uses the gamma distribution, but note the close relation between gamma and dirichlet (see e.g. “Random variate generation” here). in any case the precise conjugacy relation isn’t particularly relevant here since we’re doing stochastic variational inference anyway.
regarding the second question this is just a tutorial. there is no claim that anything is done optimally. there are presumably millions of neural network architectures that could work here, some better, some worse.