How do the bijective transformations for Norm Flows work in Pyro?

StefanCline · May 25, 2023, 9:54pm

I’m looking for a high level explanation as to the structure for 3 of the transformations (1D is fine).

Spline (linear and/or quadratic)
Householder
Discrete Cosine

It’s not clear to me how

The transformations are structured
How many parameters there are for each of them
How the back propagation is being done

I’m assuming that because they’re bijective the input size and output size must therefore be the same, but anything past that and I’m kind of lost…

If any of the Devs could explain this to me I’d greatly appreciate it

Cheers,

-Stefan

martinjankowiak · May 26, 2023, 1:04am

i suggest looking at some of the linked references

Jakub M. Tomczak, Max Welling. Improving Variational Auto-Encoders using Householder Flow.

Conor Durkan, Artur Bekasov, Iain Murray, George Papamakarios. Neural
Spline Flows. NeurIPS 2019.

Hadi M. Dolatabadi, Sarah Erfani, Christopher Leckie. Invertible Generative
Modeling using Linear Rational Splines. AISTATS 2020.

StefanCline · May 26, 2023, 11:10pm

I read the splines paper which was quite helpful. Thanks!

As a follow up, do you know of any other references or resources that show how the cost function is calculated for unsupervised learning models like this one?

Specifically to this spline transformation:

I understand the forward pass through the spline, I understand how they are making the splines and calling it g_{\mathbf{\theta}}(x) as a bunch of connected piecewise functions (i.e., a spline), and I understand the parameters being what they are.

What I don’t understand is how they are determining how well the forward pass did, i.e. how the cost function is being calculated after the forward pass is completed.

Thanks again for the help

martinjankowiak · May 27, 2023, 12:33am

there are lots of ways to use normalizing flows, see e.g. this review

probably the most vanilla usage is in density estimation in which case maximum likelihood estimation (MLE) is usually used: basically maximize mean log p(x) across your set of data points {x_i}

StefanCline · May 31, 2023, 10:15pm

Okay so I went on a crazy deep dive and all of this is muuuuch clearer now. The general structure resembling an affine transformation was incredibly important background (for anyone else who’s struggling here this guy made a great video the provides a clear understanding of the architecture behind a generative NF model: Generative Modeling - Normalizing Flows - YouTube)

Last question! What is the NN that is used inside of the spline call? The paper you referenced basically said this architecture was what they used

I ran into a CUDA memory issue because I didn’t realize there was an entire NN shoved into each call (that’s on me for not knowing the theory ahead of time), but now I’m not really sure how to see what NN you’re using in there.

Thanks again

martinjankowiak · May 31, 2023, 10:29pm

i doubt there’s any person on planet earth who’s memorized all the architectural choices made in pyro.distributions.transforms. if you want to know about details i suggest you take a look at the relevant source code.

StefanCline · May 31, 2023, 10:37pm

Ha okay got it, thanks