Hi,

I have trained a composition of conditional normalizing flows (cond. affine couplings) in order to sample from a high dimensional, multimodal distribution. Specifically I use a `ConditionalDenseNN`

(with `input_dims = 1800`

, `context_dim = 102`

, `split_dim = 900`

, `hidden_dims = [512, 512]`

, `param_dims = [900, 900]`

) for a `ConditionalAffineCoupling`

along with permutations and batch_norm. The total network is composed of 32 of these flows.

This works totally fine when training with the NLL (e.g. -flows.log_prob( x, y).mean() )

I then have the following distribution instance:

`cond_posterior = pyro.distributions.ConditionalTransformedDistribution(flows.base_dist, flows.generative_flows).condition(y)`

where the `flows.base_dist`

distribution is a `dist.Normal`

instance and `flows.generative_flows`

is a list of the composable flows (affine, permutation and batch_norm x32). If I sample from this distribution everything works perfectly.

Given the trained flows I want to estimate the MAP of the transformed distribution and I do this following the pyro tutorials:

```
def model(samples = 1):
with pyro.plate("condition",samples):
coeffs = pyro.sample("coeffs", cond_posterior)
```

and

```
autoguide_w = pyro.infer.autoguide.AutoDelta(model)
num_iters = 4000
optim = pyro.optim.ClippedAdam({"lr": 1e-6,'eps':1e-6})
svi = pyro.infer.SVI(model,
autoguide_w,
optim,
loss=pyro.infer.Trace_ELBO())
for i in range(num_iters):
elbo = svi.step()
```

However this results in the flows distribution (e.g. cond_posterior) outputting nans when sampled after the first iteration (and hence out of the support of the autoguide) and everytime i run the last part of the code at least once, I get nans when sampling from the same distribution, which originally gave me values of the transformed distribution.

So the question is am I doing something wrong?