I am working with a relatively simple extension of the model from the GMM tutorial and am running into problems with the tensor dimensions. Model evaluation works (model(X, y)
), TraceEnum_ELBO evaluation works, but calling Predictive
works only when the number of test samples matches the number of training samples.
I do not use any squeeze
, to_event
, .T
calls or anything else that might mess with the dimensions; it’s all very basic.
Here’s the fully reproducible example:
import torch
import pyro
import pyro.distributions as dist
from pyro import poutine
from pyro.infer import TraceEnum_ELBO, config_enumerate, Predictive
from pyro.infer.autoguide import AutoDiagonalNormal
# Notice the different number of samples (10 vs 5) in the training / test "datasets"
x_train = torch.randn(10)
y_train = torch.bernoulli(torch.ones(10)*0.5)
x_test = torch.randn(5)
y_test = torch.bernoulli(torch.ones(5)*0.5)
@config_enumerate
def model_LR_aleatoric(X, y, num_mixture_components=3):
beta_coef = pyro.sample(f"beta", dist.Normal(0.0, 5.0)) # size []
risk_logits = X * beta_coef # broadcasting to [len(X),] here
weights_dist = dist.Dirichlet(0.5 * torch.ones(num_mixture_components))
weights = pyro.sample('weights', weights_dist)
# location and scale parameters of the 3 noise components
with pyro.plate('components', num_mixture_components):
locs = pyro.sample('locs', dist.Normal(0., 0.5))
scales = pyro.sample('scales', dist.LogNormal(0., 0.5))
# Actual observation model
with pyro.plate("data", y.shape[0]):
# which noise component to sample from
assignment = pyro.sample('assignment', dist.Categorical(weights))
risk_noise_dist = dist.Normal(locs[assignment], scales[assignment])
risk_noise = pyro.sample('risk_noise', risk_noise_dist)
actual_risk_logits = pyro.deterministic("actual_risk_logits", risk_logits + risk_noise)
outcome_dist = dist.Bernoulli(logits=actual_risk_logits)
observation = pyro.sample("obs", outcome_dist, obs=y)
guide_LR_aleatoric = AutoDiagonalNormal(poutine.block(model_LR_aleatoric, hide=['assignment']))
pyro.clear_param_store()
model_LR_aleatoric(x_train, y_train) # works fine
elbo = TraceEnum_ELBO(max_plate_nesting=1)
elbo.loss(model_LR_aleatoric, guide_LR_aleatoric, x_train, y_train); # works fine
predictive = Predictive(model=model_LR_aleatoric, guide=guide_LR_aleatoric, num_samples=4)
samples = predictive(x_train, y_train) # works fine
samples = predictive(x_test, y_test) # fails
The error message the last command fails with is The size of tensor a (5) must match the size of tensor b (10) at non-singleton dimension 0
, in the line actual_risk_logits = ...
.
So, of course, I started tracing the shapes of the different variables throughout the various calls.
During the first model call, I find these sizes, which all seem to make sense:
ic| X.shape: torch.Size([10])
ic| y.shape: torch.Size([10])
ic| weights_dist.batch_shape: torch.Size([])
ic| weights_dist.event_shape: torch.Size([3])
ic| weights.shape: torch.Size([3])
ic| locs.shape: torch.Size([3])
ic| assignment.shape: torch.Size([10])
ic| locs[assignment].shape: torch.Size([10])
ic| scales[assignment].shape: torch.Size([10])
ic| risk_noise_dist.batch_shape: torch.Size([10])
ic| risk_noise_dist.event_shape: torch.Size([])
ic| risk_logits.shape: torch.Size([10])
ic| risk_noise.shape: torch.Size([10])
During the elbo.loss
call, exactly as outlined here, I find that the model is called twice, and in the first call (by the guide), shapes are as above, whereas in the second call, we have assignment.shape: torch.Size([3, 1])
, due to the enumeration. (Same for locs[assignment].shape
and scales[assignment].shape
.) So far, so good, I guess?
During the call predictive(x_train, y_train)
, I find that the model is called six times (number of samples is four - can someone explain the discrepancy?). During the first two calls, shapes are exactly as outlined above, whereas during the latter four calls, I have
ic| weights_dist.batch_shape: torch.Size([])
ic| weights_dist.event_shape: torch.Size([3])
ic| weights.shape: torch.Size([1, 3])
ic| locs.shape: torch.Size([3])
… can someone explain where the additional 1-dimension for the weights comes from?
And, finally, during the call predictive(x_test, y_test)
, the first two model calls are again like above (except that all 10s are replaced by 5s, the size of the test dataset). In the next model call, we then have these sizes, before the error mentioned above occurs:
ic| X.shape: torch.Size([5])
ic| y.shape: torch.Size([5])
ic| weights_dist.batch_shape: torch.Size([])
ic| weights_dist.event_shape: torch.Size([3])
ic| weights.shape: torch.Size([1, 3])
ic| locs.shape: torch.Size([3])
ic| assignment.shape: torch.Size([5])
ic| locs[assignment].shape: torch.Size([5])
ic| scales[assignment].shape: torch.Size([5])
ic| risk_noise_dist.batch_shape: torch.Size([5])
ic| risk_noise_dist.event_shape: torch.Size([])
ic| risk_logits.shape: torch.Size([5])
ic| risk_noise.shape: torch.Size([10])
It is completely opaque to me where the [10] in the last line comes from when everything else has shape [5]…?!
Any help is greatly appreciated! I know it takes significant time to sift through these complex problem descriptions.
(I am aware that there are other current threads on very similar topics - 1, 2 - , but I did not see how the solutions discussed there would solve my problem / apply to my setting.)