Hey! I’ve built a Forecaster
object based off the hierarchical time series model tutorial which uses the subsampling method explained to batch train the model, and this works well.
When it comes to predictions however, the data are very large and often results in my kernel dying. To remedy this, I’ve tried to implement a batch prediction method which will iterate across one of the hierarchical levels as follows:
# preprocessed data shape: torch.Size([49, 18316, 14, 1])
# the model takes two hierarchical variables of size 49 and 18316
complete_data = preprocess(df)
# define batch generator
def batch_gen(dataset, batch_size=1000):
n_samples = dataset.shape[1]
indices = np.arange(n_samples)
for start in range(0, n_samples, batch_size):
end = min(start + batch_size, n_samples)
batch_idx = indices[start:end]
yield dataset[:, batch_idx, :, :]
# batch predict
batch_loader = batch_gen(complete_data)
for batch in batch_loader:
covariates = torch.zeros(batch.size(-2), 0)
samples = forecaster(
batch[..., T0:T1, :],
covariates[T0:T2],
num_samples=100
)
samples.clamp_(min=0)
p5, p50, p95 = quantile(samples[..., 0], (0.05, 0.5, 0.95))
crps = eval_crps(samples, batch[..., T1:T2, :])
print(f"CRPS: {crps}")
Only this returns an AssertionError
thrown by the line
assert model_value.size(dim) > guide_value.size(dim)
in /anaconda3/lib/python3.7/site-packages/pyro/contrib/forecast/util.py
.
From what I gather, I can’t feed a model which has been trained on a given number hierarchical levels, data with a different number of hierarchical levels. In this case, feeding batches of data with 1000 different levels in the second hierarchical variable to a model with a guide that was trained on 18316 levels. Is this correct? If you happen to know of any other ways of batching predictions, or making predictions on subsamples of the hierarchical levels that the model was trained on, that would be hugely appreciated! Thanks in advance!