Very Large NUTS Treesize During Warmup Only

mrinank_sharma · February 9, 2021, 3:26pm

Hi everybody,

I’ve been running a large model using NUTS. I find the the maximum treedepth is hit very regularly. To increase statistical efficiency (in terms of effect sample size), it’s usually recommended to increase the treedepth, and therefore number of the steps that NUTS takes. I decided to give this a go, and as expected, the statistical efficiency is much better, but the runtime is much worse.

However, what I find surprising is that the model uses very large trees at the start of the warmup phase, with a treedepth of up to 17, but not when sampling. For example, the image shows the treedepth over time, where purple is during sampling and red is during warmup.

This makes me think that the geometry of the posterior is actually okay, but that NUTS is starting very far from the typical set, and is generally having a hard time adapting. The initial point is the prior median at the moment.

However, I’m not sure if my diagnosis is correct. Any suggestions would be much appreciated!

martinjankowiak · February 9, 2021, 7:48pm

are you using mass adaptation?

mrinank_sharma · February 9, 2021, 7:58pm

Yeah, I am. But not a dense mass matrix.

martinjankowiak · February 9, 2021, 8:34pm

i’m not sure about your specific hypothesis (typical set) but you can of course test this explicitly by trying a different initialization. for example, find the MAP estimate first and initialize with that.

but what you’re seeing may also just be a symptom of starting out with a mass matrix that’s very inappropriate to the problem. once the mass matrix is adjusted to the problem’s curvature, going deep into the tree may not be necessary.

it’s possible your model could benefit from some reparameterizations (e.g. if some of your latent parameters cover wildly different scales).

in any case there’s no particular need for you to worry too much about what’s happening during warm-up as long as it doesn’t take too long (and you emerge with a decent mass matrix).

although it obviously depends a lot on the problem, i usually find that a max tree depth of ~5-7 strikes a good balance between speed and sample quality.

mrinank_sharma · February 9, 2021, 9:23pm

thanks. Do you know if I can / how I can check the adapted mass matrix after warmup to see if different parameters have different scales?

mrinank_sharma · February 10, 2021, 11:35am

I managed to look at the adapted mass matrix post warmup.

It looks like some of the variables do have very different posterior scales. Do you know how I can identify the variables / what order they would be in.

Also, the model takes a very long time to run during warmup at the moment (several hours for a few hundred samples), which is the motivation behind wanting to reduce the runtime. I’m also planning to run this model several times, so maybe reusing the mass matrix or setting the initial mass matrix could be useful, but I couldn’t see a way to do this.

fehiepsi · March 17, 2021, 4:29am

Hi @mrinank_sharma, we are going to make a new design for mass matrix API. With the new API, you will be able to specify the initial inverse mass matrix (in a structural way) and (probably) specify a separate max tree depth during the warmup phase. If you are interested, please subscribe to this issue - we will have some updates soon.

(I just worried that I will forget this thread - so I reply early )

mrinank_sharma · March 17, 2021, 11:25am

Awesome! I’m looking forward to this cheers

mathlad · October 16, 2023, 2:22pm

Hello @mrinank_sharma Could you please tell me which utility function you used for plotting the tree depth of NUTS sampler? Thanks!