Explicitness in model building (partly in comparison to pymc3)

heya! i’m very new to pyro, but i’ve some experience with pymc3. sorry about the long post, but to compensate, my questions are probably very simple :slight_smile:

i found this tutorial where they have a similar noisy scale example as in official docs, and define a model like this:

def model(observations):
    
    weight_prior = pyrodist.Uniform(0.0, 2.0)
    weight = pyro.sample("weight1", weight_prior)
    my_dist = pyrodist.Normal(weight, 0.1)
    for i,observation in enumerate(observations):
        measurement = pyro.sample(f'obs_{i}', my_dist, obs=observation)

now, i can kind of get how they think with looping through the observations and checking each individual observation, but it also feels really explicit compared to pymc3.

so i tried doing it similarly to what i would do in pymc3 like so:

def model3(observatories):
    weight = pyro.sample('weight', pyrodists.Uniform(0, 2))
    measurement = pyro.sample("measurement", pyrodists.Normal(weight, 0.1), obs=observatories)

this worked nicely. so my first question is; is there some reason for doing that explicit comparison from the tutorial?

i’m asking, because my feeling so far is that pyro is more explicit/needs more detailed code than pymc3, for example that you have to define a kernel for mcmc. but it may also just be that all the tutorials and examples i’ve seen so far happen to be very explicit?

secondly, about pyro.condition, i went to official docs and found my model above was similar to the example there, namely

def scale(guess):
    weight = pyro.sample("weight", dist.Normal(guess, 1.0))
    return pyro.sample("measurement", dist.Normal(weight, 0.75))

and then you can wrap that function like this:

def deferred_conditioned_scale(measurement, guess):
    return pyro.condition(scale, data={"measurement": measurement})(guess)

my question here is if i’m correct in thinking this “deferred” way of writing the function allows us to have a generative model for simulations (scale in this case) and then simply wrap that with the deferred function in order to do inference?

if so, how? because trying to run inference using the deferred function it complains i don’t have the guess parameter:

kernel4 = pyro.infer.NUTS(deferred_conditioned_scale)
mcmc4 = pyro.infer.MCMC(deferred_conditioned_scale, num_samples=2000)
mcmc4.run(obs)
...
deferred_conditioned_scale() missing 1 required positional argument: 'guess'

my experience with pymc3 is i’ll write simulation code with regular numpy and then write basically the same function with theano for pymc3 to be able to do its thing. so the ability to wrap with the deferred functions would be great!

i really like that you can iterate code and check results as you go in pyro, pymc3/theano can be very frustrating to debug if your model doesn’t compile for whatever reason.

There are a lot of parts to this question. I am going to answer some of them and you will figure the rest out, or you will ask clarifying questions.

  1. That first tutorial you linked isn’t great and I don’t recommend following it. Read the examples instead. About the first actual question you asked – basically, “do I have to loop through all of the observations one-by-one?” You can, but you don’t have to – and actually explicitly looping through the observations like that is bad practice. This is because the observations are iid by hypothesis, and pyro has a construct to deal with conditionally independent observations called a pyro.plate. Read about it here. (You should read and understand everything on that page, honestly.) It is called plate because it has the same functionality as a plate in commonly-used graphical model notation. Here is how you can rewrite that first model using plate:
def model(observations, size,):
    weight = pyro.sample(
        "weight",
        dist.Uniform(0, 2.0)
    )
    with pyro.plate("obs_plate", size,):
        measurements = pyro.sample(
            "measurements",
            dist.Normal(weight, 0.1),
            obs=observations
        )
    return measurements

# pyro models are generative
size = 5
data = model(None, size)
print(data)

output$ tensor([0.2829, 0.1395, 0.1871, 0.1848, 0.4556])

Even though you didn’t ask, I am reflexively inclined to warn you about using uniform priors, as this model does. This is usually a really, really bad idea because doing this means that weight has probability zero of being less than 0 or greater than 2. That is almost definitely not the case in any real world model.

  1. About pyro.poutine.condition: this is equivalent to setting obs=<some values> in your model instead of obs=None as I have done above. When obs=None, the variable is not observed. The reason that your example is not working is nothing to do with pyro, but rather with your python: you have only passed a single argument to your deferred_condition_scale function that expects two arguments. When you call mcmc.run(*args, **kwargs), underneath the model is called with the signature *args, **kwargs. If you just call mcmc4.run(obs, guess) then you’d be good to go.

Does this help?

2 Likes

thanks for your reply, i really appreciate it!

sorry about (2) with the missing argument, that was a real :woman_facepalming: by me after hours of being confused!

so, if i understand you correctly, given the example of a noisy scale, and we want to find the most likely weight of an object given several observations, the following three models are equivalent?

def model1(obs):
    weight = pyro.sample("weight", pyrodists.Uniform(0, 2))
    measurement = pyro.sample("measurement", pyrodists.Normal(weight, 0.1), obs=obs)

model1(obs=None)  # outputs nothing

# define data ourselves or import or whatever
data = torch.tensor([0.74, 0.98, 0.66, 0.75, 0.84, 0.74])
def generate_measurement(real_weight):
    weight = pyro.sample("weight", pyrodists.Normal(real_weight, 1.0))
    return pyro.sample("measurement", pyrodists.Normal(weight, 0.75))

def model2(measurement, prior_guess):
    return pyro.condition(generate_measurement, data={"measurement": measurement})(prior_guess)

data = [generate_measurement(2) for _ in range(5)]
data = torch.tensor(data)
print(data)
...
output: tensor([3.5285, 0.9415, 4.7915, 3.3073, 1.6606])
def model3(obs, size, real_weight):
    weight = pyro.sample(
        "weight",
        pyrodists.Normal(real_weight, 1.0)
    )
    with pyro.plate("obs_plate", size):
        measurements = pyro.sample(
            "measurements",
            pyrodists.Normal(weight, 0.1),
            obs=obs
        )
    return measurements

data = model3(None, 5, 2)
...
output: tensor([1.8360, 1.7191, 1.9774, 1.8542, 1.9060])

with the exceptions that:

  • model1 can’t generate data and can thus only be used as inference model (uniform prior because assuming we are on earth and i know the object weighs less than my 2 liter juicebox, the weight has probability 1 to be between 0 and 2 kg :slight_smile: )
  • model2 style allows to break up code to, for example, use more intuitive variable names for generation vs inference. is there some other (dis)advantage with this style?
  • model3 allows for a single function for both generation and inference, and is the recommended way to do things?

edit: copied in the correct code

I think that model3 is “recommended”, yes. There is nothing wrong with having a purely discriminative model, e.g., model1, but it just might be convenient for you to generate data from the model as well.

The battle about uniform priors won’t be won or lost here. Nonetheless I do encourage you to think deeply about the assertion that you “know” something, e.g., that an object weighs less than 2kg. This is not on topic for the pyro forum, but I would like to share a brief anecdote from the world of quantitative finance with you: until recently, many commodities traders “knew” that the price of most commonly-traded commodities would not be negative. In fact, they were so sure of this that their trading algorithms – some of which even involved Bayesian statistics implemented via probabilistic programming languages – had hard lower cutoffs on price at zero. These same traders learned a rather interesting lesson when WTI oil futures traded at around -$37 on NYMEX. Some folks lost their shirts that day because of what priors they chose for their models (and, indeed, their choice to make their models fundamentally multiplicative instead of allowing for the possibility of additive behavior); these choices aren’t academic but rather can have far-reaching consequences.

amazing, thanks for being so patient with me, this clears some of my confusion with the documentation!

about the priors; fair enough, i see your point :slight_smile: