How do the bijective transformations for Norm Flows work in Pyro?

I’m looking for a high level explanation as to the structure for 3 of the transformations (1D is fine).

  1. Spline (linear and/or quadratic)
  2. Householder
  3. Discrete Cosine

It’s not clear to me how

  1. The transformations are structured
  2. How many parameters there are for each of them
  3. How the back propagation is being done

I’m assuming that because they’re bijective the input size and output size must therefore be the same, but anything past that and I’m kind of lost…

If any of the Devs could explain this to me I’d greatly appreciate it :slight_smile:

Cheers,

-Stefan

i suggest looking at some of the linked references

Jakub M. Tomczak, Max Welling. Improving Variational Auto-Encoders using Householder Flow.

Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural
Spline Flows. NeurIPS 2019.

Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative
Modeling using Linear Rational Splines. AISTATS 2020.

1 Like

I read the splines paper which was quite helpful. Thanks!

As a follow up, do you know of any other references or resources that show how the cost function is calculated for unsupervised learning models like this one?

Specifically to this spline transformation:

I understand the forward pass through the spline, I understand how they are making the splines and calling it g_{\mathbf{\theta}}(x) as a bunch of connected piecewise functions (i.e., a spline), and I understand the parameters being what they are.

What I don’t understand is how they are determining how well the forward pass did, i.e. how the cost function is being calculated after the forward pass is completed.

Thanks again for the help :slight_smile:

there are lots of ways to use normalizing flows, see e.g. this review

probably the most vanilla usage is in density estimation in which case maximum likelihood estimation (MLE) is usually used: basically maximize mean log p(x) across your set of data points {x_i}

Okay so I went on a crazy deep dive and all of this is muuuuch clearer now. The general structure resembling an affine transformation was incredibly important background (for anyone else who’s struggling here this guy made a great video the provides a clear understanding of the architecture behind a generative NF model: Generative Modeling - Normalizing Flows - YouTube)

Last question! What is the NN that is used inside of the spline call? The paper you referenced basically said this architecture was what they used

I ran into a CUDA memory issue because I didn’t realize there was an entire NN shoved into each call (that’s on me for not knowing the theory ahead of time), but now I’m not really sure how to see what NN you’re using in there.

Thanks again :slight_smile:

i doubt there’s any person on planet earth who’s memorized all the architectural choices made in pyro.distributions.transforms. if you want to know about details i suggest you take a look at the relevant source code.

1 Like

Ha okay got it, thanks :slight_smile: