Simple Bayesian Network Example?

I’m new to Pyro (and PPLs), so bear with me.

I’m trying to figure out how to implement a simple Bayesian network (without neural networks). I’ve crawled Google and these forums, and I’ve seen a few threads that are close, but none of the code therein actually runs without error. There seems to be some modification for each example that is required, but I can’t seem to hack it together.

Let’s say I want to implement a BN with three variables, A, B and C. A is categorical with 3 possible values. B is continuous between 0 and 1 and has a beta distribution. C is boolean. A and B are independent and are parents of C.

What’s the simplest way to implement this model in Pyro? What does the guide function look like?

What do you mean by “implement”? What do you want to do with the model? Sample from it, or do inference? Are the model parameters fixed or do you want to estimate them? If the latter, how (ML, MAP, SVI, MCMC)? If anything other than ML, what are the priors over the model parameters? How does C depend on A and B?

Have you read the introductory tutorials? For discrete variables, you’ll also want to look at the enumeration tutorial and the Gaussian mixture tutorial. And for that, it’ll probably be necessary to have at least a look at the tensor shapes tutorial.

(Please don’t misunderstand; I’m sympathetic to your confusion. I also only started using pyro recently, am certainly not an expert, and still confused about things a lot of the time. I also think the some of the introductory materials are not as intuitive as they maybe could be.)

Hi, I am also new to Pyro and at the same stage as @perceptron.

As @perceptron said, there are some examples for Bayesian Networks (BNs) in the forum but most of them cannot be compiled without errors and they are not interpretable enough for pyro newbies. It’s really important to fully understand the basic syntax that pyro has (pyro.plate, pyro.sample and pyro.param) which I didn’t make it.

I wonder does anyone know that whether Pyro includes any structural and parameters learning for BNs, which for now, I think Pyro doesn’t include functions for these and they have to be coded manually. Also, I am really confused by the Pyro syntax for defining model (e.g., defining model structure with pyro.param and also enumerate sampling for discrete BNs) and doing inference (e.g., log_prob).

I have gone through all materials that @ewipe has been mentioned but still got errors to compile. It’s really appreciated if anyone can help and give a specific and concrete example.

I’d like to do both prediction and inference with the model. I’d like to estimate the parameters with SVI.
I’ve read the introductory tutorials, but they seem to focus on other types of examples, e.g. regression. I could spend a bit more time with the GMM tutorial and tensor shapes tutorial than I have.
I’ve found I learn best from working through a code example and working backwards. I’ve done that with some of the tutorial and forum codes (and the Pyro codes for the Statistical Rethinking book), but implementing even a simple Bayesian network the estimates parameters from a dataset and then implementing some model checking with the posterior predictive distribution seems to be a bit beyond my current knowledge.

I just tried to set up a simple example notebook based on your description. You can find it here, including some explanatory comments on what’s happening: https://github.com/e-pet/notebooks/blob/main/BN.ipynb

Does that help a little bit? Feel free to ask any questions.

Again, I’m not a pyro expert - maybe someone more knowledgeable than me can suggest improvements if I’m doing stuff weirdly.

Hey @ewipe – thanks so much for making this notebook. I spent a couple hours reading and playing with your code and made a notebook of my own: https://github.com/tobyrmanders/notebooks/blob/main/pyro_bn_simple.ipynb.
I realized from your code that I needed to take another step back to an even simpler example. This time there are only two variables, A and B. A is categorical with two possible values, and B is continuous between 0 and 1.
My real problem is closer to this setup, although I’m not sure exactly how to model it. Basically I have a dataset of genetic variants labeled benign or pathogenic (or not labeled), and this corresponds to B in my head. There are causal features, some categorical and some continuous between 0 and 1 upstream of this RV. My ultimate goal is to estimate the probability that a given variant is pathogenic given assignment of other RVs. I’d like to also report on the posterior density for this node, for example credibility intervals or HDPI. Since the labels are all boolean, however, it’s unclear to me whether my target variable should be binomial, or if it needs to be continuous to access the density I describe above.
Anyway – back to the notebook – I have several points of confusion:

  • Substituting sample() with param() as in other examples in these forums yields unexpected samples from the parameterized model, but using sample() works as expected.
  • SVI fails no matter how I try to set it up. I read the shapes documentation but understanding and my use of to_event() is still limited at best.

Thanks again – looking forward to your comments!

Hi @ewipe,

Thank you for your notebook! Sorry for disturbing you since I might ask some naive questions about defining the model:

  1. Why do we need to specify pyro.sample for A,B,C prior distributions as weights, C_weights, etc. Can we specify them sth. like torch.ones(number_of_states_of_variable) if we don’t know the prior distributions and let it learn from the dataset? Sorry that I don’t really know how to initialise these variables.

  2. I wonder when do we need pyro.param and when do we don’t need. It really confused me because I sometimes saw examples using pyro.param to define variables and then using pyro.sample to learn parameters but sometimes not. I cannot differentiate the usages between pyro.sample and pyro.param.

  3. Could you please give an example of simple discrete bayesian network where each node is categorical and it has only one parent? I tried to modify it based on your notebook but got errors to define the model.

Thank you so much for your help!

Hi @perceptron, happy the notebook helped. A few comments on your questions:

  • Yes, your problem sounds like you would have a Bernoulli distributed variable (binary label “benign” or “pathogenic”, where the likelihood depends on the parent variables (exactly like the variable “C” in my notebook). Basically, whenever you have binary variable, it is Bernoulli distributed. You will have to assume some model for how the likelihood depends on the parent variables, and then try to learn the parameters of that dependency model. That is pretty exactly what I also did in the notebook.
  • If I understand correctly, you are also interested in quantifying the uncertainty of that (binomial) likelihood estimate for a given set of parents observations? That is actually something I was recently also very interested in. Sadly, I came to the conclusion that this is fundamentally impossible to achieve for binary outcomes. (Anything other than a single, binary outcome would help.)
  • The practical fundamental difference between sample() and param() statements is that the former will allow you to identify a posterior distribution over this quantity, whereas the latter will only give you a point estimate (but might make estimation easier). The two basically coincide if you use sample() with a delta distribution. Without seeing the specifics, it’s hard (for me, at least) to say what exactly went wrong when you transitioned from one to the other.
  • A very crude comment regarding my very crude understanding of shapes and to_event(): when you specify a distribution at some point in your model, ask yourself: if I call .sample() on this distribution, will the shape of the resulting thing depend on the batch size / the number of data samples the model was called with? If no, then it probably describes a single instance of random variable (that may be multi-dimensional), and calling .to_event() might be a good idea to make sure it is understood in that way. As an even cruder rule of thumb, things outside plates tend to go with .to_event(), whereas things inside plates don’t. :wink:

Two immediate comments regarding your notebook, without having gone through it in detail:

  • Currently, you only have observations of the variable A. Without also observing B, it will be impossible to learn anything about the relationship between the two.
  • I believe at least your first (impressively lengthy :smiley:) error message is due to the fact that “N” is a vector for some reason. I don’t currently see how / where you pass it, but you might simply want to set N=len(a_obs) within your model and not pass it as an argument?

Hi @Jingwen, no worries, you’re not disturbing anyone. :wink:

If there is some parameter in your model that you want to learn, you basically have two options: a) you specify it as a param() and learn a point estimate of it, or b) you specify it as a sample(), define its prior distribution, and estimate its full posterior distribution (not just a point estimate).

Say you have an observation x and assume it is Gaussian distributed, and you would like to identify the parameters of that Gaussian. Option a) would be to specify mean and variance as param()s, whereas option b) would be to assume some prior distribution over both and use sample() statements. Option a) corresponds to maximum likelihood estimation and only gives you point estimates, option b) is full posterior inference and gives you uncertainty quantification. (You might want to read up on hierarchical Bayesian modeling in this context.)

Regarding your last point, I just added a simple purely discrete example to my notebook (same link as above). Notice that if everything is discrete, variational inference doesn’t make a lot of sense, as far as I know, since you basically just have to sum over variables in specific ways and everything can be represented in probability tables. TBH I don’t know whether pyro would be the preferred tool for that? I would imagine that there are more specialized and maybe easier to use tools available, but I don’t really know any specific ones. You might want to have a look at, e.g., https://pgmpy.org/?

Ah, helpful examples! I just scan the direct BN example but didn’t go into depth. Thank you so much for the instructions! I will have a look at those materials again asap:).

Actually, I am transferring what I have written with Pgmpy to Pyro. I found that Pyro is a great PPL (well-structured) but at the same time, I didn’t think it’s interpretable enough for newbies, at least it’s hard for me to start to code BN with Pyro in a short time. But thanks to the forum, I can understand some of it finally.

@ewipe Thanks for another very helpful response. I’ve simplified my model much further and written a notebook with some interaction of prior predictive distributions, the variational distribution and a posterior predictive check. However, there are some things that are still not clicking for me:

  • In this notebook I have a single discrete variable parent and a boolean child. As mentioned above, I want to estimate (just the epistemic) uncertainty about the boolean variable likelihood. Despite your well reasoned post, my notebook… seems… to do this. In this case the Bernoulli parameter is coming from a beta distribution. Am I approaching or interpreting this incorrectly by your understanding?

  • I’d like to set up some posterior predictive checks, as in the notebook, such as performing M replicates of N samples and seeing where some the actual observations fall in. I do this successfully in the notebook, but trying to use plate notation to speed this up fails due to the shape of site B. I tried all the enumeration and to_event() tricks you have in your notebook in various permutations and reread the tensor and enumeration pages here, but alas. I think the shape of A should always be the same as the shape of B. Any thoughts?