How to persist a model+posterior and use it as prior for training on new data?

bishdata · July 3, 2022, 3:30pm

Hi I am new to numpyro.

I use numpyro and the NUTS sampler to train several 1000 of models using historical data for say a year worth of data each. As this takes many hours I’d like to know how to know the best practice about persisting the model say to a pickle file. I use the posterior and the model. Then at a later date when new data has accumulated to reload the model and posterior and use it as a prior for training the model using the new data.

In this and later steps can I/should I skip the warm up step ?

Is there some recommended metric/diagnostic to estimate if the the new data’s drift from the prior distribution.

rkauf · August 2, 2022, 2:58pm

+1, I have a similar use case but with potentially even daily updates.

martinjankowiak · August 2, 2022, 4:29pm

Then at a later date when new data has accumulated to reload the model and posterior and use it as a prior for training the model using the new data.

NUTS works on a (differentiable) model density. posterior samples are samples, basically a set of delta functions. in order to create a prior from previously collected posterior samples would require fitting a density to the posterior samples. needless to say, learning an appropriate density is hard in general, and wouldn’t necessarily save you any compute time.

a more straightforward approach is to reuse the mass matrix and step size from previous NUTS runs. this will make subsequent runs a bit faster since you can reduce the burn-in/adaptation period accordingly. however this will not result in gigantic speed-ups.