AutoRegressiveNN's default arguments

lwalkling · June 23, 2018, 10:04pm

Hi,

I have been trying to figure out how to use AutoRegressiveNN for implementing a MAF-like model.

In it’s implementation, the mask_encoding is by default initialized as

self.mask_encoding = 1 + torch.multinomial(torch.full((input_dim - 1,), 1 / (input_dim - 1)),
                                           num_samples=hidden_dim, replacement=True)

If I’m not wrong, self.mask_encoding indicates the number of input dimensions each hidden unit can depend on.
I don’t see how this initialization can guarantee the autoregressive property: It is possible that multiple elements of this vector end up as input_dim, which would guarantee elements in the upper half of the mask.

(For this reasoning I am assuming no input permutation and a suitable permutation of the hidden units so that the corresponding elements of mask_encoding are monotonously increasing)

Did I miss something or does this implementation indeed not guarantee an autoregressive layer?

martinjankowiak · July 5, 2018, 10:47pm

@lwalkling
afaik as i know the implementation is fine. there is even a unit test that checks that the jacobian is indeed triangular:

github.com

pyro-ppl/pyro/blob/eed62768acb24d9467a72f66cd8c831af5af1923/tests/distributions/test_iaf.py

from __future__ import absolute_import, division, print_function

from unittest import TestCase

import numpy as np
import pytest
import torch

import pyro.distributions as dist
from pyro.distributions.iaf import InverseAutoregressiveFlow
from pyro.nn import AutoRegressiveNN

pytestmark = pytest.mark.init(rng_seed=123)


class InverseAutoregressiveFlowTests(TestCase):
    def setUp(self):
        self.epsilon = 1.0e-6

    def _test_jacobian(self, input_dim, hidden_dim):

This file has been truncated. show original

mask_encoding is what’s called m(k) in the MADE paper (https://arxiv.org/pdf/1502.03509.pdf)

it is used to create the actual masks M_W and M_V (equations 8 and 9 in the paper), which are what controls the connectivity pattern. i hope that clarifies what’s happening in the code.