Hello! I’m playing around with a toy example with the Normalizing Flows package and I have really 2 questions. My question is is how do I properly observe samples that I want to be from the same distribution? Also, does my current code snippets using normalizing flows make any sense?

Basically, i create a posterior distribution I want my flow’s model to create (I think it’s called the Moon dataset), but am unsure how exactly to observe then such that I am optimizing my flows parameters correctly. here’s the code snippets:

```
# Create the dataset, stolen from a tutorial on Normalizing flows
batch_size=512
x2_dist = distr.Normal(loc=0.0, scale=4.)
x2_samples = x2_dist.sample((batch_size,))
x1 = distr.Normal(loc=.25 * x2_samples.pow(2),
scale=torch.ones(batch_size))
x1_samples = x1.sample()
x_samples = torch.stack([x1_samples, x2_samples], dim=1)
print(x_samples.size())
plt.scatter(x_samples[:,0].numpy(), x_samples[:,1].numpy())
# Definining my flow
nf = [InverseAutoregressiveFlow(AutoRegressiveNN(2, [10]))for i in range(6)]
nf_module = nn.ModuleList(nf)
def guide(samples, train=True):
#guide is empty because I'm not trying to approximate the posterior...
return 0.0
mu = torch.zeros(2)
sigma = torch.eye(2)
def model(samples, train=True):
# let pyro know about my flow parameters becuase that's all I want to have updated
pyro.module('nf', nf_module)
#shape I assume is.... event_shape =2 because this is a multi gaussian
dist = TransformedDistribution(MultivariateNormal(mu, sigma), nf)
#how do i properly observe all of the samples from the same distribution?
#every tutorial seems to be like "throw a plate in there" to declare independence...but what does that mean really?
with pyro.plate('batch'):
if train:
z = pyro.sample('z', dist, obs=samples) #Are these being observd as being from same distribution?
else:
z = pyro.sample('z', dist) #a less elogant way to sample later to plot
return z
```

Not sure if this is clear enough…but basically I have a 512 x 2 matrix => batch_size x event shape, and am unsure how to have this observed all by the same Transformed distribution. I only have a model function because I want to maximize the log-probability of the model w/ the normalizing flows (which are only learning parameters) and am confused on whether I do either of those things presently.