Why obs are labels in the model of a classification task in the Bayesian Neural Network?

I am new to Pyro and trying to implement the classification task of MNIST dataset using Bayesian Neural Network. In Model, I have such a line:

pyro.sample("obs", Categorical(logits=lhat), obs=y_data)

Based on my understanding, this function samples from a conditional distribution given an observation, which is just the inference algorithm.

My question is: why obs in this case is the y_data, not x_data? My understanding is that we make the inference given the observation which is x_data.

Any intuitive explanation is appreciated!

Hi @weikb, I think these intro tutorial (part1, part2) will give you an intuition for what a Pyro model looks like. Hopefully they will answer your question. :slight_smile:

Hi @fehiepsi, thanks for your reply and references. I have actually gone through the tutorial that you mentioned. I guess my question is quite specific to a classification task. Although the tutorial is helpful, it is still unclear to me that why pyro.sample("obs", Categorical(logits=lhat), obs=y_data) has obs set to y_data instead of x_data. In my understanding, the observation should x_data instead of y_data when we do the inference.

It depends on the problem I guess. If x is a random variable (I guess it is image in your case), then you can use

# each image is a tensor with values 0, 1
pyro.sample('x', dist.Bernoulli(probs), obs=x_data)

In your problem, I think y is a Categorical random variable over the categories 0, 1, 2, 3, 4, 5, 6, 7, 8, 9; so the statement

pyro.sample("obs", Categorical(logits=lhat), obs=y_data)

means that y_data follows a Categorical distribution with logits=lhat. It has nothing to do with x_data. If you use x_data there, then that statement means that x_data follows a Categorical distribution with logits=lhat. Is your x_data the image or the category?

this basically has to do with the difference between generative and discriminative modeling. a vanilla classifier is a discriminative model in which only the labels y are governed by probability distributions; the x's are something you condition on. at no point do you try to model p(x). so yes the x's are “observed” too but they are not observed random variables. only y is an observed random variable.

Thanks for your reply and explanation. I understand your argument which also makes sense to me. To summarize your explanation, you are saying that y_data is ‘obs’ since y_data is the observed random variable, and x_data is observed but not a random variable. I guess this can be seen from a graphical model of a classifier where the only observed random variable is the labels. Is this what you just meant?