Confusion about bias term

Hi all!

I am new to numpyro and trying to define my model as shown below. I am using NUTS HMC sampling for learning. I have been following this resource predominantly: Example: Bayesian Neural Network — NumPyro documentation.

Could someone help me understand how to set a unit normal prior over my bias terms (b1 and b2) ? Is this correct? I want to ensure my weights and biases are updating once HMC starts.

def model(X, Y, D_H, mask):

D_X, D_Y = X.shape[1], 1

# Sample first layer (put unit normal priors on all weights)
w1 = numpyro.sample("w1", dist.Normal(jnp.zeros((D_X, D_H)), jnp.ones((D_X, D_H)))) # dim. D_X D_H
b1 = numpyro.sample("b1", dist.Normal(0, 1), sample_shape=(D_H,))
z1 = nonlin(jnp.matmul(X, w1) + b1) # dim. N D_H  <= first layer of activations

# Sample final layer of weights and neural network output
w2 = numpyro.sample("w2", dist.Normal(jnp.zeros((D_H, D_Y)), jnp.ones((D_H, D_Y)))) # dim. D_H D_Y
b2 = numpyro.sample("b2", dist.Normal(0, 1), sample_shape=(D_Y,))
z2 = jnp.matmul(z1, w2) + b2 # dim. N D_Y  <= output of the neural network

# Put prior on observation noise. Note if X~Gamma(a,b) then 1/X~InvGamma(a,b).
prec_obs = numpyro.sample("prec_obs", dist.Gamma(0.1, 0.1))
sigma_obs = 1.0 / jnp.sqrt(prec_obs)

# Observed data mean prediction
pr_y_given_everything = dist.Normal(z2, sigma_obs)
numpyro.sample("Y", pr_y_given_everything, obs=Y) # observation obs param is critical.

return pr_y_given_everything

Thank you!

looks ok to me. it’s unfortunate you removed the many assert statements found in examples/ because that could have helped you be confident in your model specification.

b2 = numpyro.sample("b2", dist.Normal(0, 1), sample_shape=(D_Y,))
assert b2.shape == (D_Y,)

i’m not sure if using sample_shape is generally a good idea in a model. instead i’d do something like

b2 = numpyro.sample("b2", dist.Normal(0, jnp.ones(D_Y)))
assert b2.shape == (D_Y,)

Hello Martin! Wonderful, I will add those back in. Why are bias terms not added in the BNN example? From a neural network perspective, those are present (but not required) in the general case. I realize this could have been a user choice.

Why are bias terms not added in the BNN example?

just for simplicity. and it wasn’t required to get a decent fit on that fake dataset. also because that example isn’t intended to motivate people to run hmc on neural networks with millions of parameters. which isn’t going to work out of the box, to say the least.

Yes this makes sense. Is there a way to speak with someone 1-1 to discuss my use case with numpyro? I am building a BNN with HMC sampling where I have (in my current dataset) about 40K parameters. I get good performance on several of my simulated data examples, whereas for others my method does poorly.

what is the application area?


I am definitely prepared to explain further! :slightly_smiling_face:

@acon sure please feel free to email me at and we can probably find a time for a quick chat