Find MAP of Normalizing Flow distribution with SVI


I have trained a composition of conditional normalizing flows (cond. affine couplings) in order to sample from a high dimensional, multimodal distribution. Specifically I use a ConditionalDenseNN (with input_dims = 1800, context_dim = 102, split_dim = 900, hidden_dims = [512, 512], param_dims = [900, 900]) for a ConditionalAffineCoupling along with permutations and batch_norm. The total network is composed of 32 of these flows.

This works totally fine when training with the NLL (e.g. -flows.log_prob( x, y).mean() )

I then have the following distribution instance:

cond_posterior = pyro.distributions.ConditionalTransformedDistribution(flows.base_dist, flows.generative_flows).condition(y)

where the flows.base_dist distribution is a dist.Normal instance and flows.generative_flows is a list of the composable flows (affine, permutation and batch_norm x32). If I sample from this distribution everything works perfectly.

Given the trained flows I want to estimate the MAP of the transformed distribution and I do this following the pyro tutorials:

def model(samples = 1):
    with pyro.plate("condition",samples):
        coeffs = pyro.sample("coeffs", cond_posterior)


autoguide_w = pyro.infer.autoguide.AutoDelta(model)
num_iters = 4000
optim = pyro.optim.ClippedAdam({"lr": 1e-6,'eps':1e-6})
svi = pyro.infer.SVI(model,

for i in range(num_iters):
    elbo = svi.step()

However this results in the flows distribution (e.g. cond_posterior) outputting nans when sampled after the first iteration (and hence out of the support of the autoguide) and everytime i run the last part of the code at least once, I get nans when sampling from the same distribution, which originally gave me values of the transformed distribution.

So the question is am I doing something wrong?

cc @stefanwebb

Hi @martinjankowiak,

as far as I understand @stefanwebb is not part of the dev team any longer? Whatever the case, the last time he responded on the forum was over a year ago. Do you think you have any idea on how to do this?

i have no idea. when you string together 32 neural networks experiencing numerical problems isn’t exactly unusual. i’d suggest you try one of the following:

  • use fewer flows
  • look at the numerics of the flow code and see if you can improve it
  • use init_loc_fn to initialize your AutoDelta to some value of your choosing => docs example

Thanks for getting back to me. They are very shallow networks, its not like I’m trying to implement the next Deepmind project. The fact of the matter is that it has nothing to do with numerical instability. Even if I decrease the number of flows to 5 or 10 (which is not that many for this type of method) the API does not like this optimization and results in an error:

ValueError: Error while computing log_prob at site 'coeffs':
Expected value argument (Tensor of shape (1, 2048)) to be within the support (Real()) of the distribution Normal(loc: torch.Size([1, 2048]), scale: torch.Size([1, 2048])), but found invalid values:
tensor([[nan, nan, nan,  ..., nan, nan, nan]], device='cuda:0',

I suppose this would be the issue with one flow as well. I am trying to understand if this is a bug or something I’m doing wrong.