I have been trying to figure out how to use AutoRegressiveNN for implementing a MAF-like model.
In it's implementation, the mask_encoding is by default initialized as
self.mask_encoding = 1 + torch.multinomial(torch.full((input_dim - 1,), 1 / (input_dim - 1)),
If I'm not wrong, self.mask_encoding indicates the number of input dimensions each hidden unit can depend on.
I don't see how this initialization can guarantee the autoregressive property: It is possible that multiple elements of this vector end up as input_dim, which would guarantee elements in the upper half of the mask.
(For this reasoning I am assuming no input permutation and a suitable permutation of the hidden units so that the corresponding elements of mask_encoding are monotonously increasing)
Did I miss something or does this implementation indeed not guarantee an autoregressive layer?