Can I use Pyro to build a factor graph model?

I am facing a computer vision problem, and I want to build a Conditional Random Fields model to solve it.
I want something like PyTorch to do the heavy works, for example, use PyTorch to build CNNs is very easy now.

I am happy seeing that Pyro is build on top of PyTorch. As I’m new to graphical models and probabilistic programming, I found that the Pyro docs seems all about directed graphical model. I want to know whether or not Pyro can model undirected graphical models such as CRF?

Hi, this is a rather general question, do you have a more specific example in mind? Have you looked at the PyTorch tutorial on CRFs for sequence tagging?

To answer your question directly, it’s certainly possible in principle, but some inference algorithms (e.g. here) and representations (e.g. here) for undirected models may be more suitable for Pyro than others. If you want to represent your model directly as a large factor graph and do inference with belief propagation or message passing (which is probably the scenario you have in mind), you’re likely better off implementing it yourself as in the above tutorial.

Thank you @eb8680 for your informative reply. I think implementing some basic algorithms from scratch is necessary for me right now.

@nicedi I’m using a factor graph model for a problem I’m working on. The model code looks very clean, where I use pyro observe statements for each factor:

def model(data):
    x = pyro.sample("x", my_uniform_prior)
    pyro.sample("factor_1", dist.Normal(0, 1), obs=some_function_of(x))
    pyro.sample("factor_2", dist.Normal(0, 1), obs=another_function_of(x))

Then in my guide I hand-implement a message passing algorithm to predict a maximum a posteriori x, sampling it from a Delta distribution:

def guide(data):
    x = pyro.sample("x", dist.Delta(custom_predictor(data)))

This works fine in pyro.infer.SVI since I can still learn other parts of the mode using variational inference.

The one conceptual downside is that I can no longer sample from the prior by running model(data). This isn’t so bad though, since it would be straightforward conceptually to interpret the model’s obs statements as rejection sampling and draw from the prior using rejection sampling. Or you could draw samples using importance sampling. But in practice I never need to generate data from the prior.

1 Like

@nicedi, what was your final conclusion here? Did you end up implementing inference in an undirected graphical model yourself?

@others, what is the support in Pyro for undirected models?

Thank you @fritzo, I admire you have a clear understanding on Pyro’s terminologies and the concept behind them. After posting the question, I think deeply about CRF and Bayesian stuff for several days. However, my mindset is still stuck in traditional machine learning realm (loss function, backpropagation etc.), and I can’t enjoy the essence of probabilistic programming. I hope when I’m well prepared, I can do something cool according to your advice.

@kakate In these days, I implement a basic CRF model to denoise binary images. The model is trained with a gradient based approach. Here is the code (GitHub - nicedi/CRF_denoising: Build a CRF model using Chainer for binary image denoising.).
The final conclusion is that handling the partition function is the most challenging part in using CRF. And I use the PyMaxflow package to do the inference.
For probabilistic programming I read a book (Probabilistic programming in Python using PyMC3) recently, and I tried PyMC3. I agree with Judea Pearl’s critics on current development of AI. I think the AI we developed in the future should be able to reasoning not recognizing patterns merely.

@fritzo do you have by chance a more complete example of how you implemented a factor graph using pyro? Thank you!

1 Like

@fritzo, I am looking to do probabilistic modeling of factor graphs, in particular the various chapters in the book Model-based Machine Learning. So consider the following factor graph:

How would I use Pyro to model this? The multiplication being the issue: I assume that the joint probability is (latent variables are weight and score. Observed value is featureValue):

p(score, featureValue, weight) = p(weight) * p(featureValue) * p(score | featureValue, weight)
= p(weight) * p(featureValue) * delta(pscore - featureValues*weight)

I am fixing to try for the model:

w = pyro.sample(“weight”, dist.sample(Normal(0.,1.))
f = pyro.sample(“feature”, dist.sample(Normal(0.,1.), obs=featureValues)
s = pyro.sample(“score”, dist.sample(Delta(???)))

Question: what should “s” be? How to deal with a deterministic factor.

Here is another question that has come up in relation to this:

What is the difference between:

  1. pyro.sample(“z”, dist.Delta(5.))


  1. pyro.sample(“z”, dist.Normal(0.,1.), obs=5.)

In both cases, the result is 5. Why can’t Delta be used for observations? Thanks.