Architecture of the network used in Pyro normalizing flows

bzaffora · April 7, 2021, 12:47pm

I am reading the Pyro tutorial on normalizing flows (Normalizing Flows - Introduction (Part 1) — Pyro Tutorials 1.8.4 documentation) and I would like to understand better how the examples work under the hood. For instance, I am referring to the architecture of the network used to obtain the marginal distributions of the concentric circles example. In the example the base distribution (in the latent space) is normal and the flow is a rational spline:

base_dist = dist.Normal(torch.zeros(2), torch.ones(2))
spline_transform = T.Spline(2, count_bins=16)
flow_dist = dist.TransformedDistribution(base_dist, [spline_transform])

According to the tutorial, the knots (of the spline) and their derivatives are parameters that can be learnt e.g., through stochastic gradient descent on a maximum likelihood objective. The tutorial shows how to do that :

%%time
steps = 1 if smoke_test else 1001
dataset = torch.tensor(X, dtype=torch.float)
optimizer = torch.optim.Adam(spline_transform.parameters(), lr=1e-2)
for step in range(steps):
    optimizer.zero_grad()
    loss = -flow_dist.log_prob(dataset).mean()
    loss.backward()
    optimizer.step()
    flow_dist.clear_cache()

    if step % 200 == 0:
        print('step: {}, loss: {}'.format(step, loss.item()))

Finally, it is indicated how to sample from the learned distribution in order to obtain a new sample :

X_flow = flow_dist.sample(torch.Size([1000,])).detach().numpy()

I would like to know what is the architecture of the NN used to learn those parameters and if is there a (possibly simple) way to modify this architecture (e.g. add or remove layers)

More generally, I would like to adapt these simple examples to the univariate case of learning the density of time series data.

fehiepsi · April 12, 2021, 4:05am

cc @stefanwebb

bzaffora · April 13, 2021, 6:57pm

Thanks for your contribution. Should I write directly to @stefanwebb?

stefanwebb · April 13, 2021, 7:51pm

Hi @bzaffora, that’s a great question!

There are no NNs in T.Spline as this transform applies element-wise and does not condition on a vector. The parameters are simply nn.Parameter objects, see here: pyro/spline.py at dev · pyro-ppl/pyro · GitHub

On the other hand, T.ConditionalSpline will take a dense MLP, and T.SplineAutoregressive uses an autoregressive MLP known as MADE: [1502.03509] MADE: Masked Autoencoder for Distribution Estimation

Hope this helps!

bzaffora · April 14, 2021, 9:27am

Dear @stefanwebb thanks a lot for your answer!
Following your indications I managed to obtain some results which make sense. I will now play a bit with the NN architectures but the results are already satisfying.
Do you have any advice for handling time series with this machinery? Any pitfalls I should be aware of? I am considering simple univariate time series for my study.

stefanwebb · April 16, 2021, 8:39pm

I haven’t had any experience directly working with time series… I would start with an autoregressive network based on an LSTM. Then from there you could consider newer sequence models like WaveNet and Transformers