Inference on test data - simple Bayesian regression

noam · May 31, 2019, 2:23pm

Hi,
It’s my first pyro example code (I apologize if the question is oversimple) and I have difficulties in trying out the model with a fresh test sample.

Data

I am working with NCAA® basketball games played between Division I women’s teams in 2017.

Model

I am assuming that each team has a power score coming from a ~N(0,1) distribution and model the scores between home and away teams according to a Normal distribution with ~N(\text{home team power}-\text{away team power}, 10)

def normal_winner_model(home_index, away_index, score):
    num_teams = teams_2017.shape[0]
    num_games = home.shape[0]
    mu = torch.zeros(num_teams ,1)
    sigma = torch.ones(num_teams ,1)
    prior_power = pyro.sample("prior_power", Normal(mu ,sigma))
     
    maskhome = torch.zeros(num_teams ,num_games ,dtype=torch.float)\
        .scatter_(0, home_index[None,:] , 1.)
    maskaway = torch.zeros(num_teams ,num_games ,dtype=torch.float)\
        .scatter_(0, away_index[None,:] , 1.)
    score_mu = prior_power.transpose(0,1).matmul(maskhome - maskaway).squeeze()
        
    with pyro.plate("data"):
        pyro.sample("score",Normal(score_mu,10),obs = score)

inference

I am using a MultivariateNormal guide

guide = AutoMultivariateNormal(normal_winner_model)

and perform SVI inference with Adam optimizer

Testing the model

As I understand I need to replay the guide's prior_power parameter and feed the frozen model with the test data

preds = []
for _ in range(1000):
    guide_trace = pyro.poutine.trace(guide).get_trace(hloc_test, aloc_test, None)
    # assuming that the original model took in data as (x1, x2, y) where y is observed
    lifted_model = pyro.poutine.replay(normal_winner_model, guide_trace)
    preds.append(lifted_reg_model(hloc_test, aloc_test, None))

however all preds are None. What do I miss here?

Thanks

fehiepsi · June 3, 2019, 4:03pm

Hi @noam, I think that you are in the right track of using low level API poutine for predictions. I would like to explain what each line of code does so it will be easier for you to figure what is missing here:

First,

lifted_model = pyro.poutine.replay(normal_winner_model, guide_trace)

will rewrite the normal_winner_model with the effect is that: the returned value of each sample statement is obtained from the trace guide_trace.

When you call lifted_model with inputs (hloc_test, aloc_test, None), it will run the stochastic function lifted_model with replay effect as above. Because there is no return statement in normal_winner_model, it will return None. That is the reason you get all preds are None.

What we can do here is to get traces when applying lifted_model with testing inputs (I hope that it will play nicely with pyro.plate("data") statement.) For example,

pred_trace = poutine.trace(lifted_model).get_trace(hloc_test, aloc_test, None)
preds.append(pred_trace.nodes["score"]["value"])

Instead of using low level API as above, you can use TracePosterior or TracePredictive (but those utilities are under refactoring process for the next Pyro release).

noam · June 22, 2019, 4:46pm

sorry for the late response…
Thanks, worked like a charm!