Shape mismatch for adopting Forcasting I and III tutorial

I’m trying to adopt the Forecasting III Model2 to a sports setting. I’ve changed the model to be a simple linear regression on one covariate similar to in Forecasting I Model1 through Model3. So for training covariates is of size [32, 31, 15, 1] and zero_data has size [32, 1, 15, 1] which corresponds to [number of offense teams, number of defense teams, time, 1]. For training the following code runs fine:

class Model1(ForecastingModel):
    # We then implement the .model() method. Since this is a generative model, it shouldn't
    # look at data; however it is convenient to see the shape of data we're supposed to
    # generate, so this inputs a zeros_like(data) tensor instead of the actual data.
    def model(self, zero_data, covariates):
        no_teams, _, duration, _ = zero_data.size()
        _, no_def, _, _ = covariates.size()

        offense_plate = pyro.plate("offense", no_teams, dim=-4)
        defense_plate = pyro.plate("defense", no_def, dim=-3)

        # The first part of the model is a probabilistic program to create a prediction.
        # We use the zero_data as a template for the shape of the prediction.
        with offense_plate:
            bias = pyro.sample("bias", dist.Normal(hyper_param_a, hyper_param_b))
            
        with defense_plate:
            weight = pyro.sample("weight", dist.Normal(0, 0.1))
        

        prediction = bias + (weight * covariates).sum(-3, keepdim=True)
        # The prediction should have the same shape as zero_data (duration, obs_dim),
        # but may have additional sample dimensions on the left.

        
        assert prediction.shape[-4:] == zero_data.shape[-4:]       
        
        # The next part of the model creates a likelihood or noise distribution.
        # Again we'll be Bayesian and write this as a probabilistic program with
        # priors over parameters.
        
        with offense_plate:
            noise_scale = pyro.sample("noise_scale", dist.LogNormal(-5, 5))
        
        noise_dist = dist.Normal(0, noise_scale)

        #set_trace()
        # The final step is to call the .predict() method.
        with offense_plate:
            set_trace()
            self.predict(noise_dist, prediction)

But when running

samples = forecaster(data[...,T0:T1,:], covariates, num_samples=20)

I get the following error:

ValueError: Shape mismatch inside plate('offense') at site residual dim -4, 32 vs 20

Any help would be greatly appreciated!

Pyro version ‘1.3.1’

does the shape of covariates change from train to test?

I use

forecaster = Forecaster(Model1(), data[...,T0:T1,:], covariates[...,T0:T1,:], learning_rate=0.1, num_steps=1)

for train and the full covariates for test but nothing else changes. (num_steps is set to 1 for debugging)

Hi @Archai,
Could you print more shape information, including the error and maybe print() the .shapes of all results of pyro.sample statements?

Can you explain why your “offense” and “defense” plates have dims -4,-3 rather than -3,-2? IIRC the dimensions in pyro.contrib.forecast are:

  • dim -1 is an event dimension
  • all dims to the right are batch dims
  • dim -2 is the time plate: pyro.plate("time", ..., dim=-1) (note this is -1 because it is the first batch dimension, however zero_data.size(-2) == time_plate.size.
  • dim -3 and left are user-defined plates. The should be pyro.plate("my_plate", ..., dim=-2), pyro.plate("another_plate", ..., dim=-3), …

Another suspicious site is the is the noise_scale which has shape (no_teams, 1, 1, 1). I believe you’ll need to ensure this has event shape:

with offense_plate:
    noise_scale = pyro.sample("noise_scale", dist.LogNormal(-5, 5).expand([1]).to_event(1))

Again that is because all pyro.contrib.forecast models assume .event_dim == 1. IIRC there is some fancy logic to automatically convert scalar non-batched distributions like LogNormal(-5,5) to correctly-shaped distributions, but since you’re batching with offence_plate I believe you’ll need to get the event dimension correct.

Hi @fritzo,

Thank you for your very helpful response! I managed to get everything working (I think) with the following code after following your comments. The plate dimension indexing not equalling size indexing due to plates indexing batch dimensions only was exactly the problem.

# First we need some boilerplate to create a class and define a .model() method.
class Model1(ForecastingModel):
    # We then implement the .model() method. Since this is a generative model, it shouldn't
    # look at data; however it is convenient to see the shape of data we're supposed to
    # generate, so this inputs a zeros_like(data) tensor instead of the actual data.
    def model(self, zero_data, covariates):
        no_teams, _, duration, _ = zero_data.size()
        _, no_def, _, _ = covariates.size()

        offense_plate = pyro.plate("offense", no_teams, dim=-3)
        defense_plate = pyro.plate("defense", no_def, dim=-2)

        # The first part of the model is a probabilistic program to create a prediction.
        # We use the zero_data as a template for the shape of the prediction.
        with offense_plate:
            bias = pyro.sample("bias", dist.Normal(hyper_param_a, 5).expand([1]).to_event(1))
            
        with defense_plate:
            weight = pyro.sample("weight", dist.Normal(0, 0.1).expand([1]).to_event(1))
        
        prediction = bias + (weight * covariates).sum(-3, keepdim=True)
        # The prediction should have the same shape as zero_data (duration, obs_dim),
        # but may have additional sample dimensions on the left.
        
        assert prediction.shape[-4:] == zero_data.shape[-4:]       
        
        with offense_plate:
            noise_scale = pyro.sample("noise_scale", dist.LogNormal(-5, 5))
        
        noise_scale = noise_scale.unsqueeze(-1) # size [32, 1, 1, 1]
        noise_dist = dist.Normal(0, noise_scale).to_event(1) # size [32, 1, 1 | 1] i.e. [batch_dim | event_dim]

        with offense_plate:
            self.predict(noise_dist, prediction)

A few follow up questions/comments:

  • What is the best way to debug event and batch shape issues inside a ForcastingModel class? Ideally I think I should use the following but I’m not sure how exactly to get it to work with a class instead of a model function.

    trace = poutine.trace(Model1).get_trace()
    trace.compute_log_prob() # optional, but allows printing of log_prob shapes
    print(trace.format_shapes())

  • I’m a little unclear how to decide what plates self.predict() should be in.

Thanks again!

What is the best way to debug event and batch shape issues inside a ForcastingModel class?

I like your idea of tracing. To make it work with a model class I think you can simply create an instance:

model = MyForecastingModel(...)
trace = poutine.trace(model).get_trace(data, covariates)
print(trace.format_shapes())

I also often simply add `pring(f"DEBUG {x.shape}") statements in the model code.

I’m a little unclear how to decide what plates self.predict() should be in.

I believe self.predict() should be in any plate over which data is batched. That means: not the event dim (-1), not the time dim (-2), but every other nontrival dimension of data (-3, -4, …).