Observing multiple samples from same distribution? Also, does flows code make a lick of sense?

Hello! I’m playing around with a toy example with the Normalizing Flows package and I have really 2 questions. My question is is how do I properly observe samples that I want to be from the same distribution? Also, does my current code snippets using normalizing flows make any sense?

Basically, i create a posterior distribution I want my flow’s model to create (I think it’s called the Moon dataset), but am unsure how exactly to observe then such that I am optimizing my flows parameters correctly. here’s the code snippets:

# Create the dataset, stolen from a tutorial on Normalizing flows
x2_dist = distr.Normal(loc=0.0, scale=4.)
x2_samples = x2_dist.sample((batch_size,))
x1 = distr.Normal(loc=.25 * x2_samples.pow(2),
x1_samples = x1.sample()
x_samples = torch.stack([x1_samples, x2_samples], dim=1)
plt.scatter(x_samples[:,0].numpy(), x_samples[:,1].numpy())

# Definining my flow
nf = [InverseAutoregressiveFlow(AutoRegressiveNN(2, [10]))for i in range(6)]
nf_module = nn.ModuleList(nf)
def guide(samples, train=True):
    #guide is empty because I'm not trying to approximate the posterior...
    return 0.0

mu = torch.zeros(2)
sigma = torch.eye(2)
def model(samples, train=True):
    # let pyro know about my flow parameters becuase that's all I want to have updated
    pyro.module('nf', nf_module)
    #shape I assume is.... event_shape =2 because this is a multi gaussian
    dist = TransformedDistribution(MultivariateNormal(mu, sigma), nf)
    #how do i properly observe all of the samples from the same distribution?
    #every tutorial seems to be like "throw a plate in there" to declare independence...but what does that mean really?
    with pyro.plate('batch'):
        if train:
            z = pyro.sample('z', dist, obs=samples) #Are these being observd as being from same distribution?
            z = pyro.sample('z', dist) #a less elogant way to sample later to plot
    return z

Not sure if this is clear enough…but basically I have a 512 x 2 matrix => batch_size x event shape, and am unsure how to have this observed all by the same Transformed distribution. I only have a model function because I want to maximize the log-probability of the model w/ the normalizing flows (which are only learning parameters) and am confused on whether I do either of those things presently.