Hello - I’m quite new to pyro/numpyro and have read through most of the numpyro port of Statistical Rethinking (https://github.com/fehiepsi/rethinking-numpyro), and read through all the introductory tutorials for Pyro (more than once for the SVI tutorials) and numpyro.
I’m trying to simply redo a simple linear regression model implemented by a colleague of mine using numpyro to see if I can recover similar coefficient to what they got using statsmodel.
I define a linear model as:
def linear_regression(X, y=None): a = numpyro.sample("a", numpyro.distributions.Normal(0, 100)) n_scaled_features = X.shape - 1 with numpyro.plate("bs", size=n_scaled_features): b = numpyro.sample("b", numpyro.distributions.Normal(0., 800.)) b2 = numpyro.sample("b2", numpyro.distributions.Normal(0., 1.)) mu = a + b2 * X[:, -1] + jnp.sum(X[:, :-1] * b, axis=1) sigma = numpyro.sample("sigma", numpyro.distributions.HalfNormal(5)) with numpyro.plate("data", size=X.shape): numpyro.sample("obs", numpyro.distributions.Normal(mu, sigma), obs=y)
When I run this model using:
num_warmup, num_samples = 1000, 1000 mcmc = MCMC(NUTS(model=linear_regression), num_warmup, num_samples) mcmc.run(random.PRNGKey(2), X, y)
I get essentially the same coefficients as those returned by the statsmodel linear regression. However when I try to fit the model using SVI, first by following the format shown in the tutorial: https://pyro.ai/examples/bayesian_regression_ii.html,
def manual_guide(X, y=None): n_scaled_features = X.shape - 1 a_loc = numpyro.param("a_loc", jnp.zeros(1)) a_scale = numpyro.param("a_scale", 100. * jnp.ones(1), constraint=numpyro.distributions.constraints.positive) a = numpyro.sample("a", numpyro.distributions.Normal(loc=a_loc, scale=a_scale)) b_loc = numpyro.param("b_loc", jnp.zeros(n_scaled_features)) b_scale = numpyro.param("b_scale", 1000 * jnp.ones(n_scaled_features)) with numpyro.plate("bs", size=n_scaled_features): b = numpyro.sample("b", numpyro.distributions.Normal(loc=b_loc, scale=b_scale)) b2_loc = numpyro.param("b2_loc", jnp.zeros(1)) b2_scale = numpyro.param("b2_scale", 1. * jnp.ones(1)) b2 = numpyro.sample("b2", numpyro.distributions.Normal(loc=b2_loc, scale=b2_scale)) sigma_scale = numpyro.param("sigma_scale", 5. * jnp.ones(1)) sigma = numpyro.sample("sigma", numpyro.distributions.HalfNormal(sigma_scale))
I get totally different results. When I plot the loss it doesn’t seem to go down. I also find that when I try SVI with the
AutoNormal autoguide, the loss goes down by the coefficeints are totally different.
I am wondering if this has to do with the data itself, which has features with different scales, or if it has to do with how I am initializing the guide params?
A full worked example of this can be found here:
Thanks for the cool framework and any hints of what I’m doing wrong would be greatly appreciated!