Can I fit a "Trueskill" type model in pyro - factor graph with expectation propagation

I was trying to understand what types of graphical models I can fit with Pyro. After reading through Chris Bishop’s book on pattern recognition and machine learning, as well as Gelman’s book on bayesian data analysis, I have a basis sense of the range of models out there. But I am not sure what kinds of models can be fit with Pyro.

For example, it seems like the basic bayesian regression and hierarchial bayesian models are very well suited to pyro, especially because Pyro can do approximate variational inference. Pyro can also do things like Variational Autoencoders.

However, I was hoping that someone could explain whether Pyro can handle more complex models that use message passing algorithms like the sum-product algorithm, or expectation propagation. What software is used to run those types of models, and at scale? A good example is something like the
Trueskill system developed by Microsoft and visualized below–sorry for the quality of the screen capture. So this is a model that requires EP. Can we fit a model like this in Pyro, or is there some better tool?

Now I saw a reference to this type of model, in the tutorial:

https://pytorch.org/tutorials/beginner/nlp/advanced_tutorial.html

Are there additional tutorials like this for Pyro? What would be the correct tutorials/documentation to look at to see how to implement Trueskill in Pyro.

2 Likes

Hi @krishnab,

good question, TrueSkill is an interesting model. I think it’s important to separate modeling from inference: TrueSkill is a model, and it is implemented in infer.net which provides EP inference. You could implement a TrueSkill model in Pyro and try another form of inference. While Pyro doesn’t implement any distributed inference algorithms and doesn’t implement EP, I believe a good inference algorithm would be SVI with a guide that is either an AutoLowRankMultivariateNormal guide. You can read about these in the variational inference tutorials part i and part ii. If your data is really big you might need to subsample, in which case you could write a custom EasyGuide, following the easyguide tutorial.

Note to encode factor graphs in Pyro you can use observe statements pyro.sample(..., obs=...) to encode factors, e.g. the “factor_xy” site below:

x = pyro.sample("x", dist.Normal(0, 1))
y = pyro.sample("y", dist.Normal(0, 1))
pyro.sample("factor_xy", dist.Normal(0, 1), obs=x-y)

@fritzo Hey, thanks for responding to my question. Yep, I appreciate your feedback, as I am still new to the details of variational methods for graphical models. I was working through the pyro tutorials, so I should get the hang of the interface :). For the TrueSkill model, I was thinking of using a more hierarchical or structured variational inference approach, meaning not making the mean-field assumption on the variational distribution. So would I just configure the guide manually to characterize my custom factorized distribution?

I am hoping to just create some small demos for myself to compare the results from message passing versus variational inference. But then I can write some demos to compare mean-field variational inference versus hierarchical variational inference, so I am sure I will have more questions :).

Great, I’d love to see results of inference for TrueSkill models, and would love to see what those models end up looking like in Pyro. Note that whereas AutoNormal is a mean field guide, AutoLowRankMultivariateNormal can additionally capture long-range correlations, and might end up being more accurate than EP. Good luck!

1 Like

@fritzo Out of curiosity- do you know why implementations of PPLs that support message passing algorithms like EP seem to be very limited? For example, the only one I have ever found is Infer.net. Would it be possible for Pyro to support it, or does that class of algorithm require a different underlying architecture that would be hard to implement using Pyro’s effect handlers?

@jeffmax One caveat is that I believe pyro implements some message passing for discrete distributions. There are some examples in the tutorial Inference with Discrete Latent Variables — Pyro Tutorials 1.8.4 documentation .

But in general I am finding the same thing as you, namely that message passing is not common to find in most PPL packages. Part of the reason seems to be that message passing is NP-complete, so it is hard to work with anything but pretty small graphs. Hence the choices seem to be either MCMC or Variational methods for approximate inference.

I think the other issue is probably that fundamentally the pytorch backend is geared towards gradients. So MCMC uses gradients when using HMC or NUTS, and VI uses gradients in the optimization steps. I don’t know EP well enough to know whether it makes sense to use gradients with the algorithm.

That is my two cents. But I would still like to see some more complete discussion on this question. I would love to hear @fritzo thoughts on this question.

1 Like

@jeffmax I do not know why EP-backed PPLs are rare. I’d be interested to hear @eb8680_2’s opinion. To provide some pointers to various places Pyro currently uses message passing:

  • TraceEnum_ELBO and other discrete enumeration tools perform exact message passing, requiring only one or two passes.
  • DiscreteHMM and GaussianHMM perform exact parallel-scan message passing, requiring only one or two passes.
  • pyro.contrib.tracking.assignment provides a number of soft assignment solvers based on loopy belief propagation; empirically these converge in a handful of iterations.
  • Pyro’s future Funsor backend provides a moment_matching interpretation for low-level EP computations, but does not perform EP iteration.
  • pyro.contrib.epidemiology has a helper set_relaxed_distributions() similar to Funsor’s momement_matching interpretation that reinterprets Binomial and BetaBinomial distributions as moment-matched normals.

I’d also love to see EP implemented in Pyro. While I agree with @krishnab that PyTorch is very AD-based, PyTorch is also a good solid system for vectorized distributed mathematics, and I think EP could be a reasonable competitive inference algorithm for some models, especially now that we have a HorovodOptimizer. I’d be happy to help anyone implement EP in Pyro and to publish a TrueSkill tutorial. If anyone wants to start implementing EP, please create a feature request issue. I’ll help by populating with low level tasks and pointers into code. I’m happy to chat over zoom to get anyone started.

2 Likes

There are lots of other PPLs that implement some form of message-passing, notably Gibbs sampling in BUGS. I’m not an expert on EP, but I suppose the main reason EP in particular is less widely used is that, compared to Monte Carlo and optimization-based algorithms, it’s less well understood both theoretically and practically, with no convergence guarantees or universally accepted scaling/debugging strategies.

This review paper is a nice introduction to the EP literature. One flavor of EP that might be easier to implement relatively generally in Pyro is the quadrature-based version described in this paper, but I couldn’t say how well it would perform compared to the very different and highly optimized variant in Infer.NET.

1 Like

Thanks for the helpful replies and the link to the papers. I don’t think I am at a point in my understanding of the fundamentals where I could realistically attempt an implementation, but I would like to get there.

@fritzo @eb8680_2 thanks for chiming in. Yeah, I would love to help on an EP implementation and Trueskill tutorial. I am definitely not well-versed in EP, but I can help out if there is someone to ask questions to. I remember that Chris Bishop used Trueskill as a selling point for graphical models in the videos for the machine learning summer school (Graphical Models 1 - Christopher Bishop - MLSS 2013 Tübingen - YouTube), so that makes something like Trueskill a really nice tutorial–since we already have a video describing the model structure and background.

@jeffmax like I said, I am not an expert or even intermediate in message passing etc. I think I have done some tutorials on the sum-product algorithm and belief propagation, so that should get me started–since EP is a continuous analogue for BP right. If @fritzo or @eb8680_2 know of any existing EP code implementations, that would help. Also, I guess I just need to understand how to do the graph introspection in pytorch–meaning how pyro/pytorch represents the relationships between variables so that we can identify the nodes from which messages are passed, etc. So @jeffmax if you are interested we could work together, but no pressure.

@fritzo and @eb8680_2 I am trying to get my own package build and released this week. So I can start to look at this next week. In the mean time, I can start to review some of the articles on message passing and EP. Is there a good way to track this activity?

@krishnab definitely willing to work together. My experience is also limited to exercises implementing portions of BP.

Is there a good way to track this activity?

Within the Pyro team, we often start speculative new feature development with a design doc, a Google doc that contains a description of the problem to be solved, any relevant background material, a high-level description of the proposed solution plus discussion of any especially intricate or important design details, and an optional work plan. You can see some of our past design docs collected here.

Design docs are usually a better venue for collaborative discussion than GitHub issues for the initial stages of a project, and you can always start and share a stub with nothing more than a short problem description and a reading list.

Before you start thinking about the details of a prospective EP implementation, however, I would echo the earlier suggestion here of starting by implementing a TrueSkill model and trying to get it working with SVI and an autoguide. That will help you get more familiar with Pyro and with the TrueSkill model itself, and you can use it in any tutorial you end up writing.

@eb8680_2 I totally agree with your suggestions, they sound very sensible. I am looking at the google docs you mentioned, and the template is very helpful. I will get started on the template this weekend. I an share it with you all, or submit a pull request to the repo with the document link.

I agree that starting with an SVI approach to TrueSkill will be nice. It will be easier to implement and will give us a benchmark to compare against, both in terms of accuracy and computation time. Even I have to review the papers on TrueSkill to refresh my own memory.

Cool. Looking forward to getting started.