Hi everyone,
I implemented a sort-of Deep Markov Model (as in your tutorial) and used the MaskedDistribution
to cope with sequences of different lengths in the batch. This is the code snippet:
for t in range(1, T_max + 1):
k = pyro.sample(
"obs_x_%d" % t,
dist.OneHotCategorical(probs[:, t - 1, :]).mask(mini_batch_mask[:, t - 1 : t]),
obs=one_hot(target[:, t - 1], self.embedding.num_embeddings),
)
Let me suppose that T_max
is the maximum sequence length in the batch (e.g. 41), probs
is a three-dimensional tensor of shape [batch_size, max_seq_length, cat_probs] = [16, 41, 40], mini_batch_mask
is a two-dimensional boolean vector of shape [batch_size, max_seq_length] = [16, 41], and finally target
is a two-dimensional tensor of shape [16, 40].
From the code above, I would have expected that the shape of dist.OneHotCategorical(probs[:, t - 1, :]).mask(mini_batch_mask[:, t - 1 : t])
would have been [16, 40], but it returns [16, 16, 40] instead.
Am I wrong? My goal is to avoid, step by step, that the padding symbol (0) “pollutes” (to use the same words as in your tutorial) the model computation.
Thank you in advance