Shape Mismatch in predicting new data point

learner · April 19, 2023, 6:09pm

For anyone facing this, after spending some quality time with the Python debugger I realized my samples from the posterior (not the predictive posterior) included samples of the data, obs. This + the fact that I did not specify anything in return_sites when I tried to sample from the predictive posterior then I think caused Pyro to believe I was for some reason trying to sample obs, which resulted in the dimension mismatch.

This is the correct way to sample predictions on heldout data:

predictive = Predictive(model, guide=guide, num_samples=100, return_sites=['g0', 'covs'])
samples = predictive(predictors, pheno)

oob_predictors = torch.rand([5, 3])
oob_data = torch.bernoulli(torch.sigmoid(0.2 + torch.matmul(oob_predictors, coefs)))

oob_predictive = Predictive(model, posterior_samples=samples, return_sites=['_RETURN'])
oob_preds = oob_predictive(oob_predictors, None)
np.mean(oob_preds['_RETURN'].numpy(), axis=0)
## array([0.15, 0.15, 0.41, 0.28, 0.19], dtype=float32)

(note the specifications of return_sites in both Predictive() objects and the use of None where I had previously passed pheno)