Multioutput Hierarchial Regression

I am trying to replicate the behavior of least squares to predict multiple features for multiple series at once. Below is a toy example of the problem.

The pyro code here runs successfully, but generates a nonsense answer. It’s also rather slow. It doesn’t have any partial-pooling, which I plan to add after I get past this data shaping hurdle.

import numpy as np

import pyro
import pyro.distributions as dist
import torch
from pyro.infer.mcmc import NUTS, MCMC

# simulate 2 time series made from 3 common features with 20 observations, no noise for the sake of simplicity
n_obs = 20
x = np.vstack([np.arange(0, n_obs), np.arange(0, 2 * n_obs, 2), np.random.normal(size=n_obs)]).T
# each series has different coefficients, unpooled or partially-pooled
coefs = np.array([[0.5, 1, 1], [1, 2, 0.5]])
y =, coefs.T)


est_params = np.linalg.lstsq(x, y, rcond=None)[0]

############## PYRO Approach

def model(x, y):
    # hyperprior = pyro.sample("u", dist.Normal(0., 100))
    # sigma_hyper = pyro.sample("s", dist.HalfNormal(100))
    dim = y.shape[1]
    b_r = pyro.sample("bR", dist.HalfNormal(torch.ones(dim)).expand([x.shape[1], dim]).to_event(1))  # is shape (x_features, num_series)
    # sigma = pyro.sample("sigma", dist.HalfCauchy(0.5).expand([1]).to_event(1))
    mean = torch.matmul(x, b_r)  # [series]
    with pyro.plate("data"):
        pyro.sample("obs", dist.Normal(mean, 1.0), obs=y)

kernel = NUTS(model)
mcmc = MCMC(kernel, num_samples=2000)  # warmup_steps, num_chains
    torch.from_numpy(x.astype(np.float32)), torch.from_numpy(y.astype(np.float32))
posterior_samples = mcmc.get_samples()
bayes_est_params = posterior_samples['bR'].mean(axis=0).numpy().T

Any help is appreciated, thanks!

  • post Weekend attention bump

Hi @winedarksea, for mcmc we recommend the much faster NumPyro over Pyro. Syntax is similar but you’ll use JAX for tensors. Pyro really focuses on variational inference. Not sure why mcmc is giving you nonsense, you might try more warmup iterations?

Thanks for the response. I believe the Numpyro is faster, for now not concerned about speed.
I was thinking the nonsense was do to me having the above code completely wrong somehow, likely in how I shape my distribution as I noticed that all of the output parameters converge to the same value, ~0.8, which is around the average of all 3 coefficients, making me thing I am not expanding the distribution correctly.