Still getting 50% divergences after affine/linear reparametrization

quantumdoodle · May 3, 2025, 7:02am

I’m trying to infer the parameters of a nonlinear ODE system using mocks where I know the true parameter values. I’ve first found the MAP using gradient descent (adam) — this is near the truth so I know gradient optimization can work. The Fisher contours at the MAP show strong degeneracies and very different scales/variances for different parameters so NUTS takes forever to warmup in the original parameter coordinate space if I start from the same initial faraway guesses as adam.

So I computed the inverse Hessian of the loss at the MAP found by adam, and derived a simple linear whitening transformation so that NUTS can sample standard normal parameters. In my numpyro model, I apply the inverse linear transformation to these standard normal parameters immediately after they’re first sampled to get back to my original coordinate space and the rest of the model works with those as if there was no reparametrization.

This speeds up NUTS warmup and sampling but:

My NUTS posterior samples are all concentrated within the 95% Fisher contour centered on the MAP from adam and follow the overall Fisher contour shapes/degeneracies.
500/1000~50% of my samples are divergences (target_accept=0.7). I suspect this combined with early truncated trajectories that didnt actually hit a U-turn (max_tree_depth=10) makes it hard to interpret #1 since I dont know the true posterior shape…
Warmup takes 3.5 hours with target_accept=0.7 and sampling takes 20 min. If I try target accept 0.8 and max tree depth 12, warmup jumps to 65 hours ETA which is too costly. This suggests huge curvature causing divergences.

Do those 3 issues suggest that my true posterior is not just a multivariate (4 parameter) Gaussian so a simple linear/affine transformation is not enough? Not even locally near the MAP? The dense mass matrix learned by NUTS even in the whitened/affine-transformed coordinate space looks blobby with some mild elongation and when transformed to the raw/original parameter space tracks the Fisher contours around the adam MAP.

Would it be useful to learn a more complicated nonlinear transform / reparametrization instead, eg with SVI and normalizing flows? Maybe this would let me start NUTS from a faraway guess and find the global posterior shape and truth?

But would normalizing flows even help or am I running into local residual nonlinearities / strong curvature so close to the MAP that NUTS can’t leave that region (maybe the potential well near the MAP is just so deep and I’m already initializing NUTS in it so it can’t escape to explore the global posterior geometry?)

Would something like Riemannian HMC be better?

How else can I diagnose the true posterior shape/curvature both globally and near this MAP and the cause of the divergences? What’s the point of just using NUTS to explore near the MAP and not the rest of the global posterior geometry?