Confusion regarding batch_shapes, event_shapes, independence

spacewhales · June 6, 2023, 4:07pm

Dear all,

I am quite new to pyro and I am struggling a bit to undestand the relationships between event_shapes and batch_shapes, dependence and independence, and how this information is shared between different objects in pyro.

I have read the “Tensor shapes in pyro” article as well as “Reasoning about shapes and probability” and the relevant doc entries (like e.g. for pyro.plate). While mostly being able to nod along to these articles, there still seems to be some fundamental misunderstanding on my part.

I would be grateful, if you could help me out by telling me why the following minimalistic examples behave the way they do. I am relatively fit with maths and probability but have little experience with python context manager functionality which is why my examples try to sidestep pyro.plate().

Q 1 Are batch-shape, event_shape not passed to inference and sufficient to declare independence?

How I understood it, event_shape and batch-shape are designations of dependence and independence respectively and as such are passed to the inference algorithms. They designate which parts of the data later should be assumed to be a single realization of a multivariate distribution and which dimension codifies jumping to a new realization. So I thought I should be able to do the following:

Get some data in a specific shape (e.g. [10,2])
Build model that produces observed output of shape [10,2]
Declare dependence/independence by specifying the shapes.
Invoke an autoguide and run SVI

However, this is not the case, the following code behaves undexpectedly to me in the second example:

"""
    1. Imports and definitions
"""

# i) Imports

import numpy as np
import torch
import pyro
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam
from pprint import pprint


# ii) Data generation

mu_true = 0
sigma_true = 1
batch_shape = [10,2]
x_data = torch.tensor(np.random.normal(mu_true,sigma_true, batch_shape)) # all of them independent
test_input = torch.tensor(1)


# iii) Training function

def train(model, guide, x_data):
    pyro.clear_param_store()
    svi = SVI(model, guide, Adam({"lr": 0.005}), loss=Trace_ELBO())
    num_steps = 1000
    for step in range(num_steps):
        loss = svi.step(x_data)
        if step % 100 == 0:
            print("Step: ", step, "Loss: ", loss)
    print("Optimization terminated. Results follow")
    for name, value in pyro.get_param_store().items():
        print(name, pyro.param(name).data.cpu().numpy())
        

# iv) Analyze and plot

def analyze(model, trace_input):
    model_trace = pyro.poutine.trace(model).get_trace(trace_input)
    model_trace.compute_log_prob()
    pprint(model_trace.nodes)
    print(model_trace.format_shapes())
        

"""
    2. Two stochastic models
"""

# 2 different versions: i) to_event(2) to declare dependence (event shape = [10,2])
#                       ii) expand([10,2]) to declare independence (batch shape = [10,2])
#
# i) Version 1 Works as I expect it
# Define model & guide. We do not use any pyro.plate statements. Since we use 
# to_event, the distribution(2) the 2 rightmost dimensions are assumed as a single
#  event. d contains [10,2] independent copies of the Normal but considered as 
# one event -> d.batch shape is () and d.event_shape is [10,2]. 
pyro.clear_param_store()
def model(x_obs = None):
    mu = pyro.param("mean",torch.tensor(2.0))
    d = dist.Normal(mu,1).expand([10,2]).to_event(2)
    x = pyro.sample("x",d, obs = x_obs)
    assert d.batch_shape == torch.Size([])
    assert d.event_shape == torch.Size([10, 2])
    return x
guide = pyro.infer.autoguide.AutoNormal(model)
# When looking at the trace, we find everything as expected: event_shape of [10,2]
# and log_prob is a single number. The guide is empty since no latents exist.
analyze(model, x_data)
analyze(guide, ())
model()
guide()
# The optimization to fit the parameter mu terminates and seems to deliver reasonable
# results.
train(model,guide,x_data)
#
# ii) Version 2 doesnt work as I expect it
# Define model & guide. We do not use any pyro.plate statements. Since we use 
# expand([10,2]), the distribution d contains [10,2] independent copies of the 
# Normal distribution. This is also recorded for later use in the properties of 
# the distribution d -> d.batch shape is [10,2] and d.event_shape is ()
pyro.clear_param_store()
def model(x_obs = None):
    mu = pyro.param("mean",torch.tensor(2.0))
    d = dist.Normal(mu,1).expand([10,2])
    x = pyro.sample("x",d, obs = x_obs)
    assert d.batch_shape == torch.Size([10, 2])
    assert d.event_shape == torch.Size([])
    return x
guide = pyro.infer.autoguide.AutoNormal(model)
# When looking at the trace, we find everything as expected: batch_shape of [10,2]
# and log_prob is also of shape [10,2]. The guide is empty since no latents exist.
analyze(model, x_data)
analyze(guide, ())
model()
guide()
# The optimization, however does not work. It raises a ValueError and says that
# it would expect input of shape [] even though the batch_shape and the log_prob 
# shape in the model are shown as [10,2].
train(model,guide,x_data)

Can someone explain to me, why the second example does not work? To me it looks like batch_shape and log_prob shapes are properly defined in the model and the SVI should be able to make use of the [10,2] shaped data. Where am I wrong? Or is the whole premise wrong and I always need a pyro.plate to declare independence?

If the latter is the case, I have a follow-up question related to the subsequent code

Q2: How is the pyro.sample() statement handling the observations from x_data during SVI in the scenario with pyro.plates (see below)? I especially want to know how in the plate context I could, for example, pass other variables y on which x might depend in a statement akin to x[i,j] = f(y[i,j]) for x,y both being data passed to SVI.

"""
    3. Third stochastic model
"""


# iii) Version 3 Works partly as I expect it
# Define model & guide and use pyro.plate statements to declare independence.
# Since we use two plates the distribution d has the following properties
#  -> d.batch shape is [10,2] and d.event_shape is ()
pyro.clear_param_store()
def model(x_obs = None):
    mu = pyro.param("mean",torch.tensor(2.0))
    d = dist.Normal(mu,1)
    with pyro.plate("plate_1", size = 2, dim = -1):
        with pyro.plate("plate_2", size = 10, dim = -2):
            x = pyro.sample("x",d, obs = x_obs)
    return x
guide = pyro.infer.autoguide.AutoNormal(model)
# When looking at the trace, we find everything as expected: batch_shape of [10,2]
# and log_prob is also of shape [10,2]. 
analyze(model, x_data)
analyze(guide, ())
model()
guide()
# The optimization now works but seems to exhibit slower convergence even though
# I would have assumed convergence to be faster due to conditional independence
# reducing variance of the gradient estimator..
train(model,guide,x_data)

Lastly thanks a lot for taking the time to read & help!

martinjankowiak · June 7, 2023, 7:41pm

at some point we added stricter checks do that you either need to use to_event or plates.

e.g. this model works fine:

def model(x_obs = None):
    mu = pyro.param("mean",torch.tensor(2.0))
    with pyro.plate('myplate', 10):
        d = dist.Normal(mu,1).expand([2]).to_event(1)
        x = pyro.sample("x",d, obs = x_obs)
        assert d.batch_shape == torch.Size([])  # plate auto expands the batch dimension
        assert d.event_shape == torch.Size([2])
        return x

the check is performed here

also note that for stochastic variational inference with a mean field variational guide conditional independence information doesn’t give you anything unless you’re doing data subsampling (mini-batching). the purely-to-event results should be the same as the plate-only results.

spacewhales · June 10, 2023, 9:02am

Dear @martinjankowiak ,

thanks a lot for your answer. That clears things up quite a bit for me. Having everything necessarily declared by either to_event or the plates was a piece of the puzzle that was missing for me. The convergence issue that I raised was a mistake on my part, thanks for pointing it out. From looking at that code and exploring a bit more pyro.poutine.trace, I drew some conclusions but I am unsure if they are correct. It would be much appreciated if someone could tell me if the following three statements are correct:

1. plates() and expand() and passing distributional parameters with nontrivial shapes during evoking a distribution have the same impact on the shapes but plates also register independence

The difference is noticeable in the trace node associated to the variable when looking at the ‘cond_indep_stack’ entry. When defining a model using a distribution d = dist.Normal(0,1).expand([10,2]), then the entry trace_nodes[‘x’][‘cond_indep_stack’] is empty where ‘x’ is the name of the variable and trace_nodes = pyro.poutine.trace(model).get_trace(). This means, no independence is recorded. If the model is built with the help of plates, then the entry is populated with the plate info and this makes SVI know what shapes to expect.

2. Dependence and independence are a consequence of both construction and declaration. Independence comes from construction via plate() and dependence from declaration via to_event()

I first thought that it is sufficient to invoke a correctly shaped distribution d (e.g. with shape [10,2,5]) and then afterwards declare the rightmost dimension dependent via to_event(1) and the two leftmost dimensions independent via a plate construction.

But this seems to be untrue. To achieve what I want in this situation, I would need to provide the following model definition

def model(x_obs = None):
    mu = pyro.param("mean",torch.tensor(2.0))
    d = dist.Normal(mu,1).expand([5]).to_event(1)
    with pyro.plate("plate_1", size = 2, dim = -1):
        with pyro.plate("plate_2", size = 10, dim = -2):
            x = pyro.sample("x",d, obs = x_obs)
    return x

i.e. i need to declare the rightmost dim dependent with to_event(1) after creation but the two leftmost dimensions need to be constructed to be independent via plates (instead of being declared independent after creation).

3. In the above model definition, we have the follwing shapes: When the model is called to perform forward modelling by model(), then each x will be a vector of shape 5 during each sample statement. The plate accumulates all the x’s together and the result is also named x. When performing inference using SVI, pyro keeps in mind this construction and therefore accepts as an input x_obs with a shape of [10,2,5].

The individual x’s of shape [5] as they are sampled during x = pyro.sample("x",d, obs = x_obs) are of little importance as the plate construction ensures every output of the model and every input of data to SVI has shape [10,2,5].

Lastly, are there some recommendations on how to explore a model with respect shapes/independence structure and make sure that these structures are well-declared and suitable for SVI? How do you all perform model checking in this regard?

All the best & thanks a lot.

martinjankowiak · June 10, 2023, 8:48pm

that sounds right
i don’t really follow what distinction you’re trying to make. both should be fine as long as the broadcasting is fine. e.g. this is ok

def model(x_obs = None):
    mu = pyro.param("mean",torch.tensor(2.0)).expand(10, 2, 5)
    d = dist.Normal(mu,1).to_event(1)
    with pyro.plate("plate_1", size = 2, dim = -1):
        with pyro.plate("plate_2", size = 10, dim = -2):
            x = pyro.sample("x",d, obs = x_obs)
    return x

i don’t understand what ‘of little importance signifies’ or what you’re driving at.

in general i suggest carefully reading through various example code in the repo. i think this tutorial is particularly useful since it starts with a simple model and builds towards a more complex one with multiple plates

spacewhales · June 12, 2023, 1:09pm

Dear @martinjankowiak ,

thanks a lot for the reference to the tutorial.The code that you gave in conjuction to bullet point nr. 2 was exactly what I was looking for. I feel much more comfortable now with the construction rules. My bullet point nr. 3 expressed a bit of confusion as to what happens to a sample statement inside of a plate w.r.t its shape as that shape gets modified by plate’s automatic expand. But that has mostly cleared up.

BTW I think I would have benefitted quite a bit from some intermediate tutorial that sits in between the tensor shapes tutorial and the bayesian regression tutorial and would have focused on showcasing some classic tasks like linear regression in some minimalistic pyro syntax. Do you think there might be wider demand for a tutorial like that? I could try to condense my notes to contribute something like this, although it would probably take some time and require reviewing from someone more knowledgeable.

martinjankowiak · June 12, 2023, 3:42pm

@spacewhales yes we’re always eager to have more tutorials! especially those aimed at beginner/intermediate users, where we probably have the biggest need. we’re always happy to help review tutorial pull requests and help get those merged.