Support for building Bayesian Non-parametric Models in Pyro

shashankg7 · October 16, 2019, 4:57pm

Hello, I am new to PPL world. I came across TF-probability and was giving it a try, but the dependency mess in TF world is a turn-off. I am planning to switch to Pyro world for my needs.

Basically, I am interested in building Hierarchical Dirichlet Process-based models for text mining/IR.

Pardon my lack of knowledge about Pyro, but can we build Bayesian Non-parametric models using Pyro (similar to probability/Fitting_DPMM_Using_pSGLD.ipynb at main · tensorflow/probability · GitHub)?

Any leads on this would be highly appreciated.

Thanks,

eb8680_2 · October 17, 2019, 9:57pm

Hi, the model in your linked example is a truncated approximation to a Dirichlet process mixture model, which is what people tend to do in practice. You should be able to port the model in the notebook to Pyro fairly easily, since the distributions have counterparts with very similar APIs in Pyro and torch.distributions. See the Gaussian mixture model tutorial for a guide to building mixture models in Pyro using Pyro’s discrete variable enumeration machinery.

Note that you would need to use Pyro’s NUTS MCMC implementation or variational inference unless you also want to implement the algorithm (SGLD) used in the example.

shashankg7 · October 18, 2019, 6:45am

Thanks @eb8680_2 for your prompt reply.

I will read-up the tutorials you have mentioned and will try to port that example into pyro.

A follow-up question, are there any examples of BNP models (Dirichlet process based) in pyro?

shashankg7 · October 18, 2019, 10:58am

Also, is there any plans on adding SGLD in Pyro?

If not, I would like to contribute towards it.

Please let me know how to get started, where to start looking etc.

Thanks

eb8680_2 · October 18, 2019, 6:28pm

Are there any examples of BNP models (Dirichlet process based) in pyro?

You can see a bunch of examples and tutorials on our website: Getting Started With Pyro: Tutorials, How-to Guides and Examples — Pyro Tutorials 1.8.6 documentation. There are no Dirichlet process examples (yet - see issue), but the Gaussian mixture model tutorial I linked to above is very similar to a truncated Dirichlet process mixture model, and you should be able to port the model in the TFP notebook almost line for line to Pyro.

If you decide to port that example and make some progress, feel free to open a PR and ask for help or code review.

is there any plans on adding SGLD in Pyro?

Not currently, no; see this issue for some discussion about SGLD’s poor performance in practice. If you’re interested in adding it you could start with the implementation here which was linked to in that issue. I believe the Pyro interface you would need to implement is MCMCKernel, in particular the sample method. See the HMC kernel’s sample method for a reference MCMC kernel implementation.

@neerajprad thinking about this it seems like we could use a tutorial on implementing new MCMC kernels. I’ll open a GitHub issue.

shashankg7 · October 19, 2019, 11:47am

Thanks.

I will try to port the TFP DPMM example to pyro and will open a PR.

shashankg7 · October 20, 2019, 12:20pm

@eb8680_2, a slightly off-topic question.

Could you please recommend some introductory material on learning Dirichlet process along with its inference equations. Most of the text I found use slightly advanced maths, which is hard to grasp.

eb8680_2 · October 20, 2019, 9:06pm

Could you please recommend some introductory material on learning Dirichlet process along with its inference equations. Most of the text I found use slightly advanced maths, which is hard to grasp.

I think the truncation-based approach in the TFP example is probably the most accessible introduction, and the right way to work with such models in practice. Most of the advanced math you see is about extending mean-field coordinate ascent variational inference or collapsed Gibbs sampling to the infinite mixture component case, but as long as you truncate and marginalize over mixture assignments as in the TFP example or Pyro’s mixture model example you can just use any off-the-shelf MCMC or variational inference algorithm and infer the number of clusters by counting unique cluster assignments under the posterior as in the TFP example.

Unless you think the upper limit on the possible number of clusters in your data could be very large, approaching the number of datapoints, inference in the nonparametric version is only really interesting academically. If you’re still specifically interested in that version, this tutorial paper seems like a reasonable introduction to Bayesian nonparametric models in general, with lots of links in section 4 to papers on different inference approaches.