Normalizing Flow log likelihood training seems to have issues

A few days ago I posted about a very specific problem with normalizing flows. I somehow managed to simplify the problem and I have the following issue:

The aim is to train a probabilistic model p(x) such that I can sample from it and the samples maximise a function E(x). This does somehow not work.
To figure out what the issue is I reverse engineered the problem and I optimised E(x) with normal SGD and I find the parameters X that maximise E.

X = tensor([[0.8828, 0.5280, 0.5363, 0.8985],
        [0.6880, 1.0427, 1.0345, 0.6723]], requires_grad=True)

In my case E(X) = 1 which is the maximum of the function E. (This does not really matter)
Now I would like to train my probabilistic model such that p(X) has maximum likelihood.

flow_dist, transform = init_flow(input_dim = 4*2)
optimizer = torch.optim.SGD(transform.parameters(), lr=learning_rate)

for i in range(1000):
    lp = -flow_dist.log_prob(X.reshape(2*4))
    print(lp.mean().item())
    optimizer.zero_grad()
    lp.mean().backward()
    optimizer.step()

Init flow is the function that defines the pyro normalizing spline flow

def init_flow(input_dim = 2, count_bins = 8, hidden_dims = [10, 10]):
    base_dist = dist.Normal(torch.zeros(input_dim), torch.ones(input_dim), validate_args=False)
    hidden_dims = input_dim*np.array(hidden_dims)
    param_dims = [count_bins, count_bins, count_bins - 1, count_bins]
    hypernet = AutoRegressiveNN(input_dim, hidden_dims, param_dims=param_dims)
    transform = T.SplineAutoregressive(input_dim, hypernet, count_bins=count_bins)
    pyro.module("my_transform", transform)  # doctest: +SKIP
    flow_dist = dist.TransformedDistribution(base_dist, [transform], validate_args=True)
    return flow_dist, transform

This optimisation spectacularly fails. Dependent on the learning rate, if it is quite big (lr = 0.1) I end up with log likelihoods of around -300. If I choose it smaller, the training starts converging, but then at some point the training becomes very unstable and the value of lp jumps up. Does anyone know why this happens? Is there any issue with maximising likelihoods? If I change the sign of lp the optimisations converges to very low likelihoods. Therefore, it is somehow possible to make the parameters X unlikely, but not more likely.

To check if the model has problems to learn this specific parameters, I even trained it the following to double check:

for i in range(1000):
    lp = (flow_dist.rsample(torch.tensor([100])) - X.reshape(n_qubits*n_layers))**2
    print(lp.mean().item())
    optimizer.zero_grad()
    lp.mean().backward()
    optimizer.step()

I train the samples of the flow model to become X with a simple mean square error and amazingly this works very well.
Does anyone know what the problem is?

cc @stefanwebb