Forecasting with a stochastic volatility model in numpyro

Hello,

I’m trying to forecast with the following model. First I tried:

def scan_fn(carry, error):
    sigma, coeff_prev = carry
    coeff = coeff_prev + sigma*error
    coeff_prev = coeff
    
    return (sigma, coeff_prev), coeff

def weights_regression_sv(X=None, y=None):
    
    T, K = X.shape
    
    beta_unweighted   = numpyro.sample('beta_unweighted', dist.HalfNormal(jnp.ones(K)))
    beta              = numpyro.deterministic("beta",  beta_unweighted/jnp.sum(beta_unweighted))
    
    # SV
    lambda_errors      = numpyro.sample('lambda_errors', dist.Normal(0.,jnp.ones(T)))
    log_lambda_0       = numpyro.sample('log_lambda_0', dist.Normal(0.,4))
    phi                = numpyro.sample('phi', dist.InverseGamma(5,.15))
    __, log_lambda_t   = scan(scan_fn, (phi, log_lambda_0), lambda_errors, T)
    log_lambdas        = numpyro.deterministic("log_lambdas",  log_lambda_t)
    lambdas            = numpyro.deterministic("lambdas",  jnp.exp(log_lambda_t))
    
    return numpyro.sample("y",dist.Normal(jnp.matmul(X,beta),jnp.sqrt(lambdas)),obs=y)

and used

predictive = Predictive(weights_regression_sv, posterior_samples=m_sv.get_samples())
y_pred = predictive(random.PRNGKey(0), X=X_[[-1],:])["y"]
az.plot_kde(100*y_pred)
plt.show();

but got the error:

Then I tried:

def scan_fn(carry, t):
    phi, log_lambda_prev, xbeta, lambda_errors = carry
    
    log_lambda = log_lambda_prev + phi*lambda_errors[t]
    lambda_ = jnp.exp(log_lambda)
    
    y_ = numpyro.sample("y", dist.Normal(xbeta[t],jnp.sqrt(lambda_)))
    
    log_lambda_prev = log_lambda
    return (phi, log_lambda_prev, xbeta, lambda_errors), y_

def weights_regression_sv_fcast(X=None, y=None, forecast=None):
    
    T, K = X.shape
    
    if forecast:
        T_ = T+1
    else:
        T_ = T
    
    beta_unweighted   = numpyro.sample('beta_unweighted', dist.HalfNormal(jnp.ones(K)))
    beta              = numpyro.deterministic("beta",  beta_unweighted/jnp.sum(beta_unweighted))
    
    # SV
    lambda_errors      = numpyro.sample('lambda_errors', dist.Normal(0.,jnp.ones(T_)))
    log_lambda_0       = numpyro.sample('log_lambda_0', dist.Normal(0.,4))
    phi                = numpyro.sample('phi', dist.InverseGamma(5,.15))
    
    with numpyro.handlers.condition(data={"y": y}):
        xbeta = jnp.matmul(X,beta)
        __, ys   = scan(scan_fn, (phi, log_lambda_0, xbeta, lambda_errors), jnp.arange(0,T_))
    if forecast:
        numpyro.deterministic("y_forecast", ys[-1])

But got this error during model estimation:

I was working off Time Series Forecasting — NumPyro documentation, but this doesn’t really seem applicable given the use of X variables? Also I looked at Forecasting with Dynamic Linear Model (DLM) — Pyro Tutorials 1.7.0 documentation but this also doesn’t seem applicable given the absence of the Forecast module in numpyro?

Update:

This worked:

def scan_fn(carry, error):
    sigma, coeff_prev = carry
    coeff = coeff_prev + sigma*error
    coeff_prev = coeff
    
    return (sigma, coeff_prev), coeff

def weights_regression_sv_fcast(X=None, y=None, forecast=0):
    
    T, K = X.shape
    
    beta_unweighted   = numpyro.sample('beta_unweighted', dist.HalfNormal(jnp.ones(K)))
    beta              = numpyro.deterministic("beta",  beta_unweighted/jnp.sum(beta_unweighted))
    
    # SV
    lambda_errors      = numpyro.sample('lambda_errors', dist.Normal(0.,jnp.ones(T)))
    log_lambda_0       = numpyro.sample('log_lambda_0', dist.Normal(0.,4))
    phi                = numpyro.sample('phi', dist.InverseGamma(5,.15))
    __, log_lambda_t   = scan(scan_fn, (phi, log_lambda_0), lambda_errors, T)
    log_lambdas        = numpyro.deterministic("log_lambdas",  log_lambda_t)
    lambdas            = numpyro.deterministic("lambdas",  jnp.exp(log_lambda_t))
    
    T_END = T-forecast
    numpyro.sample("y_obs",dist.Normal(jnp.matmul(X[:T_END],beta),jnp.sqrt(lambdas[:T_END])),obs=y)
    numpyro.sample("y_pred",dist.Normal(jnp.matmul(X[T_END:],beta),jnp.sqrt(lambdas[T_END:])),obs=None)

Is getting forecasts this way meaningfully different from getting them using

predictive = Predictive(weights_regression_sv, posterior_samples=m_sv.get_samples())
y_pred = predictive(random.PRNGKey(0), X=X_[[-1],:])["y"]

?

P.S.: Thank you to Using lax.scan for time series in NumPyro - numpyro - Pyro Discussion Forum

I think you lambda_errors is a time-local variable (i.e. a latent variable that depends on time). The posterior will give you the samples of lambda_errors for all of your time steps. You need to resolve that logic in your prediction, where you seem to only need samples from the last step.

Thanks! Could you elaborate on what you mean when you say I need to resolve that logic?

In the model, you defined T, K = X.shape. For prediction, T=1, so your latent variable

lambda_errors = numpyro.sample('lambda_errors', dist.Normal(0.,jnp.ones(T)))

needs to have shape (1,). But your posterior gives lambda_errors with shape T != 1 because you were running MCMC on the data X with X.shape[0] != 1.

For time-local variables, it is better to move them to scan body_fn, like in gaussian_hmm example. But I’m not sure what you want to predict, so the first step might be to reconsider how you construct your forecasting model.

Mmmm, sorry, I’m still confused. Let me reset and try this:

My model is:

y_t = \beta x_t + \lambda_t^{0.5} \epsilon_t where \epsilon_t \sim N(0,1)
\log(\lambda_t) = \log(\lambda_{t-1}) + \phi v_t where v_t \sim N(0,1)

So it follows (I think) that to forecast y_{T+1} given x_{T+1}, I would need (\lambda_1,...,\lambda_T, \lambda_{T+1}). I implement the entire model as follows (note that lambda_errors=v_{1:T}):

def scan_fn(carry, error):
    sigma, coeff_prev = carry
    coeff = coeff_prev + sigma*error
    coeff_prev = coeff
    
    return (sigma, coeff_prev), coeff

def weights_regression_sv_fcast(X=None, y=None, X_forecast=None):
    
    T, K = X.shape
    
    if X_forecast is not None:
        T_ = T+X_forecast.shape[0]
    else:
        T_ = T
    
    beta_unweighted   = numpyro.sample('beta_unweighted', dist.HalfNormal(jnp.ones(K)))
    beta              = numpyro.deterministic("beta",  beta_unweighted/jnp.sum(beta_unweighted))
    
    # SV
    lambda_errors      = numpyro.sample('lambda_errors', dist.Normal(0.,jnp.ones(T_)))
    log_lambda_0       = numpyro.sample('log_lambda_0', dist.Normal(0.,4))
    phi                = numpyro.sample('phi', dist.InverseGamma(5,.15))
    __, log_lambda_t   = scan(scan_fn, (phi, log_lambda_0), lambda_errors, T_)
    log_lambdas        = numpyro.deterministic("log_lambdas",  log_lambda_t)
    lambdas            = numpyro.deterministic("lambdas",  jnp.exp(log_lambda_t))
    
    numpyro.sample("y_obs",dist.Normal(jnp.matmul(X,beta),jnp.sqrt(lambdas[:T])),obs=y)
    if X_forecast is not None:
        numpyro.sample("y_pred",dist.Normal(jnp.matmul(X_forecast,beta),jnp.sqrt(lambdas[T:])),obs=None)

My goal is to do 1-step-ahead forecasts but this code should work for h-step-ahead for any positive integer h. But, let’s assume I just want a 1-step-ahead. So, I need X (containing x_1,...,x_T), y containing y_1,...,y_T, and then an x_{T+1}. Given that, for a run of the code above, I should get (\lambda_1,...,\lambda_{T+1}) from:

    # SV
    lambda_errors      = numpyro.sample('lambda_errors', dist.Normal(0.,jnp.ones(T_)))
    log_lambda_0       = numpyro.sample('log_lambda_0', dist.Normal(0.,4))
    phi                = numpyro.sample('phi', dist.InverseGamma(5,.15))
    __, log_lambda_t   = scan(scan_fn, (phi, log_lambda_0), lambda_errors, T_)
    log_lambdas        = numpyro.deterministic("log_lambdas",  log_lambda_t)
    lambdas            = numpyro.deterministic("lambdas",  jnp.exp(log_lambda_t))

And then I plug in (\lambda_1,...,\lambda_{T}) into the likelihood and save \lambda_{T+1} to generate my forecast in:

numpyro.sample("y_pred",dist.Normal(jnp.matmul(X_forecast,beta),jnp.sqrt(lambdas[T:])),obs=None)

For example:

X_train = X_[:(len(X_)-1)]
y_train = y_[:(len(X_)-1)]
X_test = X_[[-1]]
m_sv_fcast.run(random.PRNGKey(0),X=X_train,y=y_train,X_forecast=X_test)
m_sv_fcast.print_summary(0.9,exclude_deterministic=False)

So my questions would be:

  • Does this look right?
  • If it does look right, what’s the difference between this and excluding the “y_pred” line, calculating only T lambdas, and plugging in the X_forecast into a call to predictive
  • If it looks wrong, then (how) can I fix it?
  • And, to your earlier comment, is it better to just sample v_t \sim N(0,1) in the scan body_fn?

The point is your lambda error has length T but when it is used for prediction, it has length T_.

To fix the issue, you should move lambda error sample statement to scan function, like in the gaussian hmm example. Otherwise, you need to sample something like lambda_error_of_forecast, then concatenate with lambda_error_posterior_samples. Both are equivalent in my opinion.

I think I understand? How’s this:

def scan_fn(carry, error):
    sigma, coeff_prev = carry
    coeff = coeff_prev + sigma*error
    coeff_prev = coeff
    
    return (sigma, coeff_prev), coeff

def weights_regression_sv_fcast_3(X=None, y=None, X_forecast=None):
    
    T, K = X.shape
    
    # Beta
    beta_unweighted   = numpyro.sample('beta_unweighted', dist.HalfNormal(jnp.ones(K)))
    beta              = numpyro.deterministic("beta",  beta_unweighted/jnp.sum(beta_unweighted))
    
    # SV
    lambda_errors      = numpyro.sample('lambda_errors', dist.Normal(0.,jnp.ones(T)))
    log_lambda_0       = numpyro.sample('log_lambda_0', dist.Normal(0.,4))
    phi                = numpyro.sample('phi', dist.InverseGamma(5,.15))
    __, log_lambda_t   = scan(scan_fn, (phi, log_lambda_0), lambda_errors, T)
    log_lambdas        = numpyro.deterministic("log_lambdas",  log_lambda_t)
    lambdas            = numpyro.deterministic("lambdas",  jnp.exp(log_lambda_t))
    
    # Likelihood
    numpyro.sample("y_obs",dist.Normal(jnp.matmul(X,beta),jnp.sqrt(lambdas)),obs=y)
    
    # Forecasts
    if X_forecast is not None:
        T_f               = X_forecast.shape[0]
        lambda_errors_f   = numpyro.sample('lambda_errors_f', dist.Normal(0.,jnp.ones(T_f)))
        __, log_lambdas_f = scan(scan_fn, (phi, log_lambdas[-1]), lambda_errors_f, T)
        lambdas_f         = numpyro.deterministic("lambdas_f",  jnp.exp(log_lambdas_f))
        numpyro.sample("y_pred",dist.Normal(jnp.matmul(X_forecast,beta),jnp.sqrt(lambdas_f)),obs=None)

I think part of my confusion was that for numpyro there aren’t any examples like this; the forecasting examples rely on a single data series y from which you can generate iterative forecasts, like you can always do \hat{y}_{T+1} = \beta y_{T} + \epsilon_T using the posterior and then keep iterating forward to get \hat{y}_{T+2} =\beta \hat{y}_{T+1}+ \epsilon_T and so on.

(Assuming this is right, would it be useful example to contribute? I’ll connect it to a paper.)

Yeah, having an example would be great. But I would advocate moving all time-local latent sample statement to scan body function. That’s what scan primitive used for. Otherwise, you can just use jax.lax.scan, which will require more engineering hack (like drawing a different latent site, then concatenating at correct dimensions) when dealing with more complicated models. Why not just (using the same pattern as in gaussian hmm example):

def scan_fn(carry, x):
    del x
    error = numpyro.sample('lambda_error', dist.Normal(0, 1))
    ...
    return (sigma, coeff_prev), coeff

I understand what you’re saying…I was trying to avoid it because I didn’t know what argument to use for xs in that case. But yeah, I guess I can just delete it. Or, would the GaussianRandomWalk dist work in this case maybe?

I think so (see stochastic volatility example). Its computation is vectorized, so the sampling might be faster. For forecasting, you will need to draw an additional GaussianRandomWalk like your current approach.

Hmmm, actually I don’t think it’ll work because GaussianRandomWalk doesn’t except an initial value. Sounds like an Github issue…

You can just add the init value to the gaussian random walk outputs. By definition, “random walk” pattern should not depend on the init value.

You can just add the init value to the gaussian random walk outputs.

You’re right. Whoops!

By definition, “random walk” pattern should not depend on the init value.

True, but papers I’ve seen usually make a big deal about putting a prior on the initial value. I’m sure if it matters or not, but others seem to think so.

Voila:

def weights_regression_sv_fcast_2(X=None, y=None, X_forecast=None):
    
    T, K = X.shape
    
    # Beta
    beta_unweighted   = numpyro.sample('beta_unweighted', dist.HalfNormal(jnp.ones(K)))
    beta              = numpyro.deterministic("beta",  beta_unweighted/jnp.sum(beta_unweighted))
    
    # SV
    log_lambda_0           = numpyro.sample('log_lambda_0', dist.Normal(0.,4))
    phi                    = numpyro.sample('phi', dist.InverseGamma(5,.15))
    log_lambdas_ex_init    = numpyro.sample("log_lambdas_ex_init",  dist.GaussianRandomWalk(scale=phi,num_steps=T))
    log_lambdas            = numpyro.deterministic("log_lambdas",  log_lambdas_ex_init+log_lambda_0)
    lambdas                = numpyro.deterministic("lambdas",  jnp.exp(log_lambdas))
    
    # Likelihood
    numpyro.sample("y_obs",dist.Normal(jnp.matmul(X,beta),jnp.sqrt(lambdas)),obs=y)
    
    # Forecasts
    if X_forecast is not None:
        T_f                    = X_forecast.shape[0]
        log_lambdas_ex_init_f  = numpyro.sample("log_lambdas_ex_init_f",  dist.GaussianRandomWalk(scale=phi,num_steps=T_f))
        log_lambdas_f          = numpyro.deterministic("log_lambdas_f",  jnp.exp(log_lambdas_ex_init_f+log_lambdas[-1]))
        lambdas_f              = numpyro.deterministic("lambdas_f",  jnp.exp(log_lambdas_f))

        numpyro.sample("y_pred",dist.Normal(jnp.matmul(X_forecast,beta),jnp.sqrt(lambdas_f)),obs=None)

Thanks for bearing with me :slight_smile:

1 Like

Glad it works! The prior for init value is used because there is no guarantee that the random walk starts at zero. I should put “pattern” in a quote. :wink:

Hello! I just wanted to add a link here for a new, but very-much related, topic if anyone in this thread might be interested. I could be wrong, but I linked a google colab notebook (in my post) that displays scan as providing backward smoothing behavior in the parameters through time. This is of course OK, if this of interest, but raises issues of look-ahead bias for forecasts… any and all feedback appreciated!