This topic might have been better to add on to Forecasting with a stochastic volatility model in numpyro.
I have been wondering for some time whether or not the scan function in a time series model with time varying weights (DLM etc.) are actually providing posterior samples from the perspective of smoothing, not forward filtering. I have attached a very simple example in a google colab notebook (please let me know if anything is not working, if anything is unclear, or if I’ve made an error  it was somewhat hastily done). As opposed to repeating myself with less clarity, I think the notebook will provide a better idea of my concern. Of course, from the perspective of smoothing this is all OK, but I was hoping to know for certain what is going on. The backwards smoothing perspective in relation to the scan function seems to coincide with the paper quoted in the examples ( Simo Särkkä, Ángel F. GarcíaFernández)… at least, according to my brief skim.
I did rerun the hmm example (Example: Hidden Markov Model — NumPyro documentation), and did notice commentary about the forward filter with scan, however, I did not analogize it in detail to the time setting nor experiment to see if there was any “lookahead” bias in the parameters. I also briefly tried to get the hmm_enum examples running, but after a bit of tinkering on my local pycharm I gave up (for now).
Here is the link:
link
In the event that scan is conducting backwards smoothing (using information all the way up to end point time T), does anyone know if there is a way to manipulate scan to get forward filtering using only information only up to time t (for all t)? I suppose I could use scan at each run up to t_2, then t_3 and so on, take out the posteriors, evolve them, and make a forecast… but this would be quite a bit more computation than I’d hope to do…
I very much hope I am using scan in the wrong way and that forward filtering posteriors are attainable! Any help at all would be greatly appreciated!
Thank you
Just a further point of clarification / question:

I noticed that within the scan source code the paper quoted (Sarkka and GarciaFernandez) mentioned extending the linear/Gaussian framework to nonlinear/nonGaussian models. Given the use of NUTS within scan (docu forecasting example), I presume this extension has been found and integrated into numpyro/scan?

Is there further documentation/references to understand how HMC integrates with scan? Or, are nonlinearities (for example, in the state equation) actually not modelled correctly when the model is run? I ask since the function documentation says “Not all transition functions f
are supported”, and given the paper mentioned had not yet solved for nonlinear/nonGaussian cases, I wondered if this might include such nonlinear/nonGaussian cases, or if the comment regarding f
was more about other cases such as a nested scan (which is unsupported)

I went ahead and added a section to the google colab note mentioned above (link here again) that explores a nonlinear state equation. It seems to be modelling the nonlinear state equation, endorsing the idea that HMC has correctly been integrated with the scan functionality… but I’d love to see more detail/references on how this accomplished. I didn’t go all the way and compare the nonlinear results with that of an unscented kalman filter, for example, but I presume the results would be comparable…

Last, if anyone has any updates on using scan for forward filtering coefficients (i.e. without lookahead bias), I would be super appreciative!
Ok, I have further checked the model versus a simple for loop in the time series.

It behaves like scan

which seems like a problem! both are displaying lookahead bias in the posteriors… would love to be wrong, but this doesn’t look good…

…or, and has been done elsewhere with scan, the model is simply only trained up to a point 0:T and then forecasted from T:end, wherein the posterior parameters would have no opportunity to learn from data past T. This is fine too, but in order to get posteriors (within scan or a for loop, it seems) is simply to rerun a model from 0:t, again again from 0:t+1, and so on, taking the last posterior of each time point… is there anyway to avoid this heavy rollingwindow esque computation in order to achieve posteriors without lookahead bias (and hence forecasts without lookahead bias)?
for loop section also added to this google colab notebook:
Hi @djchester, in those examples, the posterior is conditioned on the full data. To get forward filter distribution, you might want to use TFP LinearGaussianSSM.
hi @fehiepsi! Thanks for your response.
I may have simplified my model a bit for demonstrative purposes, however, the models I actually want (/have) built are nonlinear Gaussian state space models (pursuing full hmc as opposed to a sequential monte carlo particle fitler). Presumably, forward filtered posteriors (and hence forecasts) would be impossible within an hmc based forloop or scan model within numpyro? [I believe this is what you are saying]
Then, I can confirm that from a smoothing perspective numpyro’s scan + hmc works well for state learning in a nonlinear (in the state equation) framework – on par or better than an unscented kalman filter, for example. However, the paper mentions that nonlinear nonGaussian extensions to the scan functionality was for future work… so, then, I guess I am very curious as to how numpyro extended the smoother as scan + hmc (and hence nonlinear frameworks) clearly works! [there is an example in the colab notebook as well, and one can see a clear difference in the linear vs. nonlinear model]
Finally, at one point I implemented a nonGaussian state space model with scan and without scan but found the time it took was the same. Does scan recognize the nonGaussian distributions in the sampler and default to a loop? when using Gaussian state space models, scan is of course significantly faster.
We do forward sampling forecasts using Predictive, like in the Time Series Forecasting — NumPyro documentation example. To get posterior, we simply use HMC on the potential function p(obs, latent). We do not use any fancy algorithms in scan.
1 Like
Thank you @fehiepsi! I see the traintest format as opposed to a onestep ahead ‘online’ style forecasting. I suppose the only way to do this would be a rollingwindow style retraining of the model at each time step.
Then, it seems  and looks  like scan works very broadly with any model specification using hmc  such as nonlinear formulations (but perhaps not all nonGaussian distributions).

I think the origination of my confusion is related to the source documentation of the scan primative numpyro.contrib.control_flow.scan — NumPyro documentation, particularly the definition of scan(), the reference paper is Temporal Parallelization of Bayesian Smoothers by Simo Särkkä, Ángel F. GarcíaFernández. Which, at the risk of repeating myself, was in reference to smoothing linear Gaussian state space models, and not nonlinear formulations…

But, this then implies there has been more work on making scan() usable in general settings beyond what the reference paper discusses. I also checked jax._src.lax.control_flow.loops — JAX documentation, but could not find any reference papers

is there more material (papers?) to really understand how scan() works with general hmc styled models? Currently it seems a bit mysterious…
Thanks again!
scan is just a fancy for loop. it does not itself do inference.
in numpyro:
 scan is used in model definitions to define autoregressive priors
 inference is done with inference algorithms like
HMC
or SVI
Predictive
is used to take samples from an inference algorithm (it does not itself do inference) and pipe them back through a model to get various predictive quantities
1 Like
I see, it just strikes me as interesting to understand the scan aspect in a little more detail with respect to the parallelization that seems to be occurring, and how this parallelization is interplaying with an HMC mixer. (or, at least, this is how scan generally seems to be presented). The scan models certainly mix with speeds that imply parallelization…

from the scan() definition: “…The joint density is evaluated using parallelscan (reference [Sarkka, GarciaFernandez]) over time dimension, which reduces parallel complexity to O(log(length))
”

…of course this was a specific linear Gaussian case in the paper…so I would be curious how scan() uses parallelscan over the time dimension for nonlinear models (that require hmc). References/papers for what’s underneath the hood of scan() in numpyro would be super super appreciated!
Then, (posterior) predictives are great, but of course, as you know this is not quite suitable for forecasts… no matter, as I think I see now that rollingwindows are probably the only way forward (and I’ll just have to eat the computational cost ) EDIT: Question, from above, it seems confirmed that all parameters in a scan (or loop) model are conditioned on the full data up to the last time point. Then, I presume that using Predictive
within the scanstyled model (or loop) would sample out posteriors, say at t = 5, but unfortunately, the posteriors at t = 5 would have conditioned on data up to t = end, rendering the posterior predictive at t=5 a distribution that has actually conditioned on all data up to t = end. The reason I bring this up, is because if the posterior predictive were somehow only conditioning on data up to t=5, then one could evolve the posterior predictive as a true forecast density for t=6 without lookahead bias. This would alleviate the need for rerunning the hmc model at every time point (rolling window), but alas, this seems out of reach at the moment…
parallelscan is only used for hmmlike discrete latent variable models.
generic inference algorithms like HMC
do not produce any intermediate filteringlike results. for that you need SMC or similar
1 Like
Perfect, thank you!
But, just to completely confirm (apologies for repetition), scan is really not using a parallelization for the hmc models (that do not involve latent discrete variables)? Scan is really just an improved forloop?.. I find that fairly impressive given the speedups!
in e.g. numpyro/examples/hmm_enum.py at master · pyroppl/numpyro · GitHub if priors are included log(T) parallel scan is used to marginalize out discrete latent variables and HMC does inference on the (remaining) continuous latents
1 Like
I think this makes sense, regarding scan for the continuous variables it says

“under the hood, scan
stacks all the priors’ parameters and values into
an additional time dimension. This allows us computing the joint
density in parallel.”

Rather, if we have \theta_{1,t} = \theta_{1,t1} + \varepsilon_{1,t}, \varepsilon_{1,t} \sim N(\cdot,\cdot), then scan would lump (\theta_{1,t=1},\ldots, \theta_{1,t=T}) into a single multivariate prior and similarly if \theta_{2,t} = \theta_{2,t1} + \varepsilon_{2,t}, \varepsilon_{2,t} \sim N(\cdot,\cdot) then we would lump (\theta_{2,t=1},\ldots, \theta_{2,t=T}) into another multivariate prior, and then perform HMC with (y_{t=1},\ldots,y_{t=T}) where y_{t}(\theta_{1,t},\theta_{2,t}). Rather, this is in opposition to running HMC at each time point ~{\Leftrightarrow} “vectorizing > looping”. Reasonable since the joint is compositional (\theta_{1,t},\ldots,\theta_{1,T}) = p(\theta_{1,t=1})\prod_{t=2}^{T}p(\theta_{1,t}\theta_{t,t1})… (and, not a profound statement, but this works for any nonlinear formulation)… this would definitely bear faster mixing…
actually i think i may have accidentally mislead you.
scan also handles linear gaussian models using funsor; however it’s still the case that the computation that is being done during inference is marginalization (so in effect smoothing).
@ordabayev can provide a more detailed answer than i