Hi,
I have trained a composition of conditional normalizing flows (cond. affine couplings) in order to sample from a high dimensional, multimodal distribution. Specifically I use a ConditionalDenseNN
(with input_dims = 1800
, context_dim = 102
, split_dim = 900
, hidden_dims = [512, 512]
, param_dims = [900, 900]
) for a ConditionalAffineCoupling
along with permutations and batch_norm. The total network is composed of 32 of these flows.
This works totally fine when training with the NLL (e.g. -flows.log_prob( x, y).mean() )
I then have the following distribution instance:
cond_posterior = pyro.distributions.ConditionalTransformedDistribution(flows.base_dist, flows.generative_flows).condition(y)
where the flows.base_dist
distribution is a dist.Normal
instance and flows.generative_flows
is a list of the composable flows (affine, permutation and batch_norm x32). If I sample from this distribution everything works perfectly.
Given the trained flows I want to estimate the MAP of the transformed distribution and I do this following the pyro tutorials:
def model(samples = 1):
with pyro.plate("condition",samples):
coeffs = pyro.sample("coeffs", cond_posterior)
and
autoguide_w = pyro.infer.autoguide.AutoDelta(model)
num_iters = 4000
optim = pyro.optim.ClippedAdam({"lr": 1e-6,'eps':1e-6})
svi = pyro.infer.SVI(model,
autoguide_w,
optim,
loss=pyro.infer.Trace_ELBO())
for i in range(num_iters):
elbo = svi.step()
However this results in the flows distribution (e.g. cond_posterior) outputting nans when sampled after the first iteration (and hence out of the support of the autoguide) and everytime i run the last part of the code at least once, I get nans when sampling from the same distribution, which originally gave me values of the transformed distribution.
So the question is am I doing something wrong?