Hello! First of all I would like to thank all the efforts the dev have put into this library and I really appreciate your work!
I am new to numpyro and recently I am testing out the partial pooling on hierarchical time series data (most insights from the prey and predator tutorial). Let’s say we have dataset of two parallel 2D time series (prey & predator) from the same region, but with different time span (Ts=15 & Ts=25).
While setting up the hierarchical levels is manageable, I am stuck on how to fit the long time series data into the odeint according to its grouping.
Currently, I have a model like this:
def hierarchical_model(y=None, level=None):
z_init = numpyro.sample("z_init", dist.LogNormal(1, 1).expand([2]))
N = y.shape[0]
ts = jnp.arange(float(N))
r_μ = numpyro.sample("r_μ", dist.HalfNormal(1, 3))
c_μ = numpyro.sample("c_μ", dist.HalfNormal(5, 5))
m_μ = numpyro.sample("m_μ", dist.HalfNormal(1, 1))
e_μ = numpyro.sample("e_μ", dist.HalfNormal(1, 1))
r_σ = numpyro.sample("r_σ", dist.HalfNormal(1, 1))
c_σ = numpyro.sample("c_σ", dist.HalfNormal(1, 1))
m_σ = numpyro.sample("m_σ", dist.HalfNormal(1, 1))
e_σ = numpyro.sample("e_σ", dist.HalfNormal(1, 1))
nr_levels = len(np.unique(level))
with numpyro.plate("regions", nr_levels):
theta_r = numpyro.sample("theta_r", dist.Normal(r_μ, r_σ))
theta_c = numpyro.sample("theta_c", dist.Normal(c_μ, c_σ))
theta_m = numpyro.sample("theta_m", dist.Normal(m_μ, m_σ))
theta_e = numpyro.sample("theta_e", dist.Normal(e_μ, e_σ))
z = numpyro.deterministic('z', odeint(dz_dt_h, z_init, ts, theta_r[level], theta_c[level], theta_m[level], theta_e[level], mxstep=1000) )
sigma = numpyro.sample("sigma", dist.LogNormal(1, 1).expand([2]))
numpyro.sample("y", dist.Normal(z, sigma), obs=y)
mcmc = MCMC(
NUTS(hierarchical_model, dense_mass=True),
num_warmup=1000,
num_samples=1000,
num_chains=1,
progress_bar=False if "NUMPYRO_SPHINXBUILD" in os.environ else True,
)
mcmc.run(PRNGKey(1262),y=values, level=grouping)
mcmc.print_summary()
with values (data) in the long format of shape = (40, 2):
DeviceArray([[ 0, 0],
[ 0, 0],
[ 0, 7],
[ 1, 3],
.....
[ 0, 43],
[ 0, 60],
[ 0, 46],
[ 0, 43]], dtype=int32)
and grouping in the shape of (40,) to describe the grouping like this:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1])
ATM I am feeding all the data to the model without considering the grouping into odeint (as you can see, the ts takes in the whole dataset). However, it is incorrect since the 2 groups of time series are not sequential. Subsampling seems like an option but the time span of each group of data is not fixed (one 15, one 25). So I wonder if anyone has any idea on how to feed the data correctly?
I ran though the Q&A and this post seems to be the most equivalent. I hope I didn’t miss anything.