I implemented a sort-of Deep Markov Model (as in your tutorial) and used the
MaskedDistribution to cope with sequences of different lengths in the batch. This is the code snippet:
for t in range(1, T_max + 1): k = pyro.sample( "obs_x_%d" % t, dist.OneHotCategorical(probs[:, t - 1, :]).mask(mini_batch_mask[:, t - 1 : t]), obs=one_hot(target[:, t - 1], self.embedding.num_embeddings), )
Let me suppose that
T_max is the maximum sequence length in the batch (e.g. 41),
probs is a three-dimensional tensor of shape [batch_size, max_seq_length, cat_probs] = [16, 41, 40],
mini_batch_mask is a two-dimensional boolean vector of shape [batch_size, max_seq_length] = [16, 41], and finally
target is a two-dimensional tensor of shape [16, 40].
From the code above, I would have expected that the shape of
dist.OneHotCategorical(probs[:, t - 1, :]).mask(mini_batch_mask[:, t - 1 : t]) would have been [16, 40], but it returns [16, 16, 40] instead.
Am I wrong? My goal is to avoid, step by step, that the padding symbol (0) “pollutes” (to use the same words as in your tutorial) the model computation.
Thank you in advance