Saving intermediate steps?

campagne · September 27, 2021, 6:55am

Hi,

I have a use case of HMC inference which demand ~24h on a single GPU and it may happen that the job is halted by the resource manager, do you know if there is a way

to save the JIT compilation state and resume to proceed to the run() afterwards?
and if I ask for 5,000 samples (after warm-up), to save the samples by 1,000 batches?

Thanks.

fehiepsi · September 27, 2021, 12:30pm

I think you can use post_warmup_state for this: just perform 5 mcmc.run(...) to get samples in batches then concatenate.

campagne · September 27, 2021, 12:35pm

Ha. fine @fehiepsi , but is there a numpyro.save_state(<file name with an extension>, mcmc.post_warmup_state) and symmetrically a state = numpyro.load_state(<file name with an extension>) ?
Thanks

fehiepsi · September 27, 2021, 12:42pm

Currently, we don’t have support for it. I think using post_warmup_state is convenient enough. You can build up your own pipeline from it.