Shape issues with a hierarchical timeseries model

Hi. I am trying to build a hierarchical stochastic volatility model, similar to a flat one, described in the docs, but for a 2-D stock returns dataset, shaped n_dates x n_assets.

I tried following the tutorials and using the plate context, like so (or variations thereof):

def hierarchical_SV_model(data):

    # hyper priors
    sigma_mu = numpyro.sample('sigma_mu', dist.Exponential(10.))
    sigma_sd = numpyro.sample('sigma_sd', dist.Normal(0, .05))
    nu_mu = numpyro.sample('nu_mu', dist.Exponential(.1))
    nu_sd = numpyro.sample('nu_sd', dist.Normal(0, 5))
    
    # priors
    n_dates, n_assets = data.shape
    with numpyro.plate('assets', n_assets, dim=-1):
        sigma = numpyro.sample('sigma', dist.Normal(sigma_mu, sigma_sd))
        h = numpyro.sample(
            'h', dist.GaussianRandomWalk(scale=sigma, num_steps=n_dates))
        nu = numpyro.sample('nu', dist.Normal(nu_mu, nu_sd))

        # likelihood
        return numpyro.sample(
            'y', dist.StudentT(df=nu, loc=0.0, scale=jnp.exp(-2 * h)), obs=data,
        )

But I get broadcasting or "Model and guide shapes disagree at site: " errors. I also tried setting the shapes manually with sample_shape=, but that also results in errors.

Any clarification and help would be greatly appreciated. Thank you!

I’m not sure which site has the shape mismatch from the info you provided. You might want to try

return sample('y', dist.Student(nu[..., None], ...).to_event(1), obs=data.T)

to make sure that dates is the event dimension of the likelihood and assets is the batch dimension of the likelihood (see tensor shapes tutorial for those notions).

Hi @fehiepsi. Thanks for the fast reply. Yes, it turns out that I needed to transpose the data and add a few other minor tweaks to make it work.

On a side note, as a new user, I am a little confused by the difference between event_shape, batch_shape, and sample shape. I am also not sure what .to_event(1) does in the snipper you provided. Would you please elaborate or point me to some resources to learn more?

I am sure I will get these things as I go along, but I find the documentation is too terse sometimes for me to grasp it right away. Thanks!

I think the tensor shape tutorial in my last comment explains well those notions (including to_event(1),…). Did you walk through that tutorial?

Yep, I think I typed my reply before I fully read your message or something lol. Will go through the tutorial! Thanks a million!

Hey @fehiepsi. I kinda gave it my best effort with the tutorial. However, I am still fuzzy on some of the concepts, with the code behaving in unexpected ways.

For example, right now I am working with a flat model, modeling only one asset, and trying to force it to sample as a 2D array of n_dates x 1 asset. Here’s the code:

def model(ts_length, data=None):
    with numpyro.plate('asset', 1, dim=-1):
        sigma = numpyro.sample('sigma', dist.Exponential(10.))
        h = numpyro.sample(
            'h',
            dist.GaussianRandomWalk(scale=sigma, num_steps=ts_length)
        )
        nu = numpyro.sample('nu', dist.Exponential(.1))
        y = numpyro.sample(
            'y_pred',
            dist.StudentT(df=nu, loc=0., scale=jnp.exp(-2 * h)),
            obs=data.T
            )
        return y

The above returns samples in the shape of n_draws x n_assets x n_dates, which is unexpected IMO, as the plate dim is set to -1, so I would expect the asset to be the rightmost dimension, but it comes out as dates on the rightmost dim, necessitating a transposition of the observed data.

The only way I found to avoid transposing the observed data is to call .reshape(-1, 1) on the h prior and the likelihood, however, even with that it still returns dates as the rightmost dimension. Furthermore, it generates an extra dim, with the returned samples being of the shape (1 (n_assets?), n_draws, 1 (n_assets?), n_dates).

Really, confused by this.

Also, in regard to the to_event(1) call, I think I sort of understand what it does now. But to me, it seems optional in most cases. For example, I believe the above code would work the same regardless of whether to_event(1) is called on the likelihood, and/or the h prior. Is that correct?

Also, it looks like I can move the likelihood and return statements outside of the plate context without changing the results. That is also a bit confusing.

Again, really appreciate any explanation and further guidance. Thanks!

I think dates is the event dimension of h, and StudentT uses h as one of its parameter, so dates should be the event dimension of that StudentT distribution. I’m not sure how to interpret the likelihood if it does not have event dimension. You can write down the shape of StudentT, see which dimension corresponds to assests, which should be the rightmost “batch” dimension. You can also draw the plate diagram of your model. Any dimensions are not plate dimensions should be event dimensions.

Note that you need to reshape nu as in my above comment if you use to_event.

Interesting, so when I call .to_event(1) on h (inside the plate context) and print its shape, it still comes out as (..., n_dates). Is that expected? The way I interpret this phrase in the tutorial, it should do the opposite: “… by calling the .to_event(n) property where n is the number of batch dimensions (from the right) to declare as dependent .” Shouldn’t the to_event call be setting the rightmost dimension to 1?

Here is a reproducible example:

from numpyro.examples.datasets import SP500, load_dataset
import numpyro

_, fetch = load_dataset(SP500, shuffle=False)
dates, data = fetch()

def model(ts_length, data=None):
    with numpyro.plate('asset', 1, dim=-1):
        sigma = numpyro.sample('sigma', dist.Exponential(10.))
        h = numpyro.sample(
            'h',
            dist.GaussianRandomWalk(scale=sigma, num_steps=ts_length).to_event(1)
        )
        print(h.shape)
        raise

mcmc = MCMC(NUTS(model), num_warmup=1000, num_samples=1000,
                num_chains=1, chain_method='sequential', progress_bar=True
                )

mcmc.run(random.PRNGKey(42), len(dates), data)

which outputs (1, 1, 2427) with the to_event(1) call, or (1, 2427), without it, instead of (2427, 1).

Ok, reshaping nu and calling to_event(1) on the likelihood (but not the h parameter), produces the desired shape (n_dates, 1).

One more question, should the likelihood and return statements be outside the plate context? Inside? Or it does not matter?

Also, sampling in what I think is the correct configuration, i.e. n_dates x 1 with the .to_event(1) on the likelihood becomes arduously slow (estimated 20+ hours vs. what should be taking just a few minutes). I noticed that the original example used the fori_collect instead of the high-level MCMC interface. Could that be the reason?

Thank you!

Ok, it looks like I needed to reshape all parameters in the likelihood, and then it samples as expected! Now, I just need to verify that the posterior samples look OK and I should be good to go. Thanks again for all your help!

UPDATE: Actually spoke too soon. It works as expected without the plate context, but still flips the dimensions with it, so dates become rightmost. It actually looks like something happening downstream of the model function, as the correct shape is printed inside the model function call, but a wrong configuration is returned with .get_samples(), with dates dim moving to the right.

In fact, going into the source code of .get_samples(), it accesses the ._states_flat attribute, which seems to have the shapes transposed inside of it, e.g., calling mcmc._states_flat['z']['h'].shape returns (n_draws x n_assets x n_dates), even though the shape of h inside the model fn prints (n_dates x n_assets).

Hi @sharsenij14 the shape of h in your last model is (1, 1, ts_length) IIUC. I don’t think that’s what you want but I will explain what’s going on:

  • sigma has batch dimension asset
  • so dist.GaussianRandomWalk(scale=sigma,...) will have batch dimension asset and event dimension ts_length
  • so dist.GaussianRandomWalk(scale=sigma,...).to_event(1) will have empty batch dimension and event dimensions (assets, ts_length).
  • so h will have batch dimension (asset) and event dimension (asset, ts_length) because h belongs to asset plate.

Hope it helps.

Thanks @fehiepsi. It looks like, with your help, I figured out how to specify the model function, but the dimensions still get switched somewhere downstream, e.g., the data appears to be transposed internally in mcmc._states_flat and transposed data is returned when .get_samples is called. For now, I will just reorder dimensions in post-processing and see if the plots look reasonable. Assuming everything looks good and it is actually modeling the data correctly, this unexpected re-ordering of dims might be a minor bug of some sort.

Could you summarize the bug? I thought things work as expected… I can make a github issue for it if you let me know what is your actual result , expected result for which model? We used fori_collect just to illustrate the low level api of MCMC.

The bug/issue (if any) is that get_samples and predictive return different shapes from the same model (different ordering of dims, dates/assets transposed in this case). Predictive gives the expected shape, same if shape is printed in the model fn, but get_samples seems to be transposing data here. It is not that big of a deal, as I need to do a lot of post-processing anyway to get it into xarray and harmonize with my existing code base. Still, I will open a github issue, when I get around to it, as at a minimum I am curious what is causing it. Thanks again for all your help!

open a github issue

Thanks! Please ping me then. It is still not clear to me what’s the model you are talking about, which site has incorrect shapes from get_samples method, what is your expected shape for that site,…