Hello,
I noticed that there’s a new Horseshoe example on master, and I had coincidentally been working on one myself, but I had been trying to use plates. For example, below is my version of the new Horseshoe example, with plates:
def model_normal_likelihood2(X, Y):
N, D_X = X.shape
# sample from horseshoe prior
tau = numpyro.sample("tau", dist.HalfCauchy(jnp.ones(1)))
with numpyro.plate("plate_beta", D_X):
lambda_ = numpyro.sample("lambdas", dist.HalfCauchy(jnp.ones(1)))
unscaled_betas = numpyro.sample("unscaled_betas", dist.Normal(0.0, jnp.ones(1)))
scaled_betas = numpyro.deterministic("betas", tau * lambda_ * unscaled_betas)
assert scaled_betas.shape == (D_X,)
# compute mean function using linear coefficients
mean_function = jnp.dot(X, scaled_betas)
prec_obs = numpyro.sample("prec_obs", dist.Gamma(3.0, 1.0))
sigma_obs = 1.0 / jnp.sqrt(prec_obs)
# observe data
with numpyro.plate("plate_data", N):
numpyro.sample("Y", dist.Normal(mean_function, sigma_obs), obs=Y)
For reference, this is the model from numpyro/examples/ horseshoe_regression.py:
def model_normal_likelihood(X, Y):
D_X = X.shape[1]
# sample from horseshoe prior
lambdas = numpyro.sample("lambdas", dist.HalfCauchy(jnp.ones(D_X)))
tau = numpyro.sample("tau", dist.HalfCauchy(jnp.ones(1)))
# note that in practice for a normal likelihood we would probably want to
# integrate out the coefficients (as is done for example in sparse_regression.py).
# however, this trick wouldn't be applicable to other likelihoods
# (e.g. bernoulli, see below) so we don't make use of it here.
unscaled_betas = numpyro.sample("unscaled_betas", dist.Normal(0.0, jnp.ones(D_X)))
scaled_betas = numpyro.deterministic("betas", tau * lambdas * unscaled_betas)
# compute mean function using linear coefficients
mean_function = jnp.dot(X, scaled_betas)
prec_obs = numpyro.sample("prec_obs", dist.Gamma(3.0, 1.0))
sigma_obs = 1.0 / jnp.sqrt(prec_obs)
# observe data
numpyro.sample("Y", dist.Normal(mean_function, sigma_obs), obs=Y)
I tried out the new Horseshoe example and my plated version above on some data generated using sklearn as follows:
# Simulate Linear Regression
from sklearn import datasets
X, y, coef = datasets.make_regression(n_samples=400,
n_features=10,
n_informative=3,
n_targets=1,
bias=5.0,
noise=2.0,
coef=True,
random_state=0)
Both implementations took about the same time (56s original vs. 55s plated) and had a similar number of divergences (187 original v. 210 plated), although it did seem like the plated version had better n_eff across parameters.
My question is: are these two implementations equivalent? And, in particular, in the plated version, is it okay in general to use mean_function, which is shaped (D_X,), in sampling normal variables like in
# observe data
with numpyro.plate("plate_data", N):
numpyro.sample("Y", dist.Normal(mean_function, sigma_obs), obs=Y)
mean_function matches the dim of Y, so does numpyro “know” that each row of mean_function corresponds to the mean of the matching row of Y?
Thank you.