Plating and the numpyro HS Example

Hello,

I noticed that there’s a new Horseshoe example on master, and I had coincidentally been working on one myself, but I had been trying to use plates. For example, below is my version of the new Horseshoe example, with plates:

def model_normal_likelihood2(X, Y):
    N, D_X = X.shape

    # sample from horseshoe prior
    tau = numpyro.sample("tau", dist.HalfCauchy(jnp.ones(1)))
    
    with numpyro.plate("plate_beta", D_X):
        lambda_ = numpyro.sample("lambdas", dist.HalfCauchy(jnp.ones(1)))
        unscaled_betas = numpyro.sample("unscaled_betas", dist.Normal(0.0, jnp.ones(1)))
        scaled_betas = numpyro.deterministic("betas", tau * lambda_ * unscaled_betas)
        
    assert scaled_betas.shape == (D_X,)

    # compute mean function using linear coefficients
    mean_function = jnp.dot(X, scaled_betas)

    prec_obs = numpyro.sample("prec_obs", dist.Gamma(3.0, 1.0))
    sigma_obs = 1.0 / jnp.sqrt(prec_obs)

    # observe data
    with numpyro.plate("plate_data", N):
        numpyro.sample("Y", dist.Normal(mean_function, sigma_obs), obs=Y)

For reference, this is the model from numpyro/examples/ horseshoe_regression.py:

def model_normal_likelihood(X, Y):
    D_X = X.shape[1]

    # sample from horseshoe prior
    lambdas = numpyro.sample("lambdas", dist.HalfCauchy(jnp.ones(D_X)))
    tau = numpyro.sample("tau", dist.HalfCauchy(jnp.ones(1)))

    # note that in practice for a normal likelihood we would probably want to
    # integrate out the coefficients (as is done for example in sparse_regression.py).
    # however, this trick wouldn't be applicable to other likelihoods
    # (e.g. bernoulli, see below) so we don't make use of it here.
    unscaled_betas = numpyro.sample("unscaled_betas", dist.Normal(0.0, jnp.ones(D_X)))
    scaled_betas = numpyro.deterministic("betas", tau * lambdas * unscaled_betas)

    # compute mean function using linear coefficients
    mean_function = jnp.dot(X, scaled_betas)

    prec_obs = numpyro.sample("prec_obs", dist.Gamma(3.0, 1.0))
    sigma_obs = 1.0 / jnp.sqrt(prec_obs)

    # observe data
    numpyro.sample("Y", dist.Normal(mean_function, sigma_obs), obs=Y)

I tried out the new Horseshoe example and my plated version above on some data generated using sklearn as follows:

# Simulate Linear Regression
from sklearn import datasets

X, y, coef = datasets.make_regression(n_samples=400,
                                 n_features=10, 
                                 n_informative=3,
                                 n_targets=1,
                                 bias=5.0,
                                 noise=2.0,
                                 coef=True,
                                 random_state=0)

Both implementations took about the same time (56s original vs. 55s plated) and had a similar number of divergences (187 original v. 210 plated), although it did seem like the plated version had better n_eff across parameters.

My question is: are these two implementations equivalent? And, in particular, in the plated version, is it okay in general to use mean_function, which is shaped (D_X,), in sampling normal variables like in

    # observe data
    with numpyro.plate("plate_data", N):
        numpyro.sample("Y", dist.Normal(mean_function, sigma_obs), obs=Y)

mean_function matches the dim of Y, so does numpyro “know” that each row of mean_function corresponds to the mean of the matching row of Y?

Thank you.

yes these models should be equivalent and yes your use of plate looks ok. differences between the two should be small and entirely “random”. e.g. your tau sample statement is at the top of the model while for me it’s second so probably even for the same random number seed the initialization will differ and so you’ll get different results. probably if you re-ordered the sample statements you’d get the same results (although who knows, there may be other small numerical differences)

Thanks so much!