I read the splines paper which was quite helpful. Thanks!

As a follow up, do you know of any other references or resources that show how the cost function is calculated for unsupervised learning models like this one?

Specifically to this spline transformation:

I understand the forward pass through the spline, I understand how they are making the splines and calling it g_{\mathbf{\theta}}(x) as a bunch of connected piecewise functions (i.e., a spline), and I understand the parameters being what they are.

What I don’t understand is how they are determining how well the forward pass did, i.e. how the cost function is being calculated after the forward pass is completed.

there are lots of ways to use normalizing flows, see e.g. this review

probably the most vanilla usage is in density estimation in which case maximum likelihood estimation (MLE) is usually used: basically maximize mean log p(x) across your set of data points {x_i}

Okay so I went on a crazy deep dive and all of this is muuuuch clearer now. The general structure resembling an affine transformation was incredibly important background (for anyone else who’s struggling here this guy made a great video the provides a clear understanding of the architecture behind a generative NF model: Generative Modeling - Normalizing Flows - YouTube)

Last question! What is the NN that is used inside of the spline call? The paper you referenced basically said this architecture was what they used

I ran into a CUDA memory issue because I didn’t realize there was an entire NN shoved into each call (that’s on me for not knowing the theory ahead of time), but now I’m not really sure how to see what NN you’re using in there.

i doubt there’s any person on planet earth who’s memorized all the architectural choices made in pyro.distributions.transforms. if you want to know about details i suggest you take a look at the relevant source code.