# How to define a likelihood function in numpyro?

I’m very new to numpyro and I’ve been trying to implement a simple linear regression using numpyro, just to get an understanding of how it works. Here’s a snippet of the code I’ve written based on the documentation provided.

``````import jax
from jax import random
# jax numpy is imported as np NOT numpy
import jax.numpy as np
import numpyro
import numpyro.distributions as dist
from numpyro.infer import MCMC, NUTS, HMC
import scipy.sparse
import sys
import numpy

# Choose the "true" parameters.
m_true = -0.9594
b_true = 4.294
f_true = 1.0
​
# initializing parameters
N = 50; J = 1
X = random.normal(random.PRNGKey(seed=123), (N, J))
weight = np.array([m_true])
error = 0.1 * random.normal(random.PRNGKey(234), (N, ))
y_obs = f_true * (X @ weight + b_true) + error
y = y_obs.reshape((N, 1))

def model(X, y=None):
ndims = np.shape(X)[-1]
ws = numpyro.sample('betas', dist.Normal(0.0, 10.0*np.ones(ndims)))
b = numpyro.sample('b', dist.Normal(0.0, 10.0))
sigma = numpyro.sample('sigma', dist.Uniform(0.0, 10.0))
f = numpyro.sample('f', dist.Normal(0.0, 10.0))
mu = f * (X @ ws + b)
​   return numpyro.sample("y", dist.Normal(mu, sigma), obs=y)

nuts_kernel = NUTS(model)
num_warmup, num_samples = 500, 1500
mcmc = MCMC(nuts_kernel, num_warmup, num_samples, num_chains=1)
mcmc.run(random.PRNGKey(4), X, y = y_obs)
mcmc.print_summary()
``````

The model is of the form y = f*(m*x + b). My question is that how is the likelihood function defined? The function “model”, I presume, defines probabilistic variables for all the model parameters. Does the return statement in the model correspond to the posterior probability distribution? How does numpyro know what the standard errors in my data are? How do I specify that?

There are people on this forum that can explain things better but let me make a couple points.

(1) In the model it should be `mu = f * (X @ ws + b) `, I believe. Otherwise `f` is not used.
(2) If each independent observation is a scalar, then `y` should simply be a vector of dimensions (N,). There is an excellent tutorial on pyro tensor shapes. Numpyro follows the logic.
(3) The likelihood of the model is not identified. Notice you can multiply `f` by a constant and divide `ws` and `b` by the same constant and `mu` would be unaffected. If likelihood is not identified, the priors in bayesian models become extremely important and their importance does not diminish when sample sample size increases.
(4) All the the return statement does in this model is that it returns a draw from `Normal(mu, sigma)`. That is, if you run `numpyro.handlers.seed(model,random.PRNGKey(234))(X)`, the model will first sample for all latent sites (`betas, b, sigma, f`) and finally for the site `y` and return it to you.
(5) If you want to obtain log-likelihood you can do this:

``````ll = numpyro.infer.util.log_likelihood(
model = model,
posterior_samples = {"betas": m_true * np.ones((1,1)), "b" : np.array([b_true]), "f" : np.array([f_true]), "sigma" : np.array([0.1])},
X = np.array(X),
y = y_obs)
ll["y"].mean()
``````

(6) Not sure what you mean by “standard errors”. It’s a bayesian model. You specify the variance of observations to be `sigma` and infer it. If you would like to allow for correlations between observations you would have to model it explicitly.

1 Like

Thanks! Some clarifications

(1) the model is `mu = f * (X @ ws + b)`. I was trying to see if I obtain a solution with the ``f``` factor, so I must’ve missed adding that in the code snippet I sent here. I have edited the question appropriately.

(2) Each observation is a scalar for now. There is a variable `J` which represents the size of the vector. It’s set to be 2 for now, in order to get the simplest case to work. The array shape is specified to be `(N, J)`. Thanks for the tutorial link. I’ll go through it in greater detail shortly.

(3) I’m still not clear on this. Could you please elaborate on this? Since the stepping in HMC depends on the likelihood function, how does numpyro arrive at the posterior? My crude understanding is that, the user specifies the priors for all the parameters and the likelihood function. The NUTS sampler uses the priors to obtain markov chains that sample the parameter space. If the likelihood function isn’t provided by the user, how is the NUTS sampler able to navigate through the parameter space?

(6) Yes, when I said “standard error” I meant data variance. For now we assume that the covariance matrix is diagonal.

Thanks for the info regarding tensor shapes. Here’s a modified code that seems to work. But the question on the likelihood function still remains.

``````# Choose the "true" parameters.
m_true = -0.9594
b_true = 4.294
f_true = 0.534
​
# initializing parameters
N = 50; J = 2
X = random.normal(random.PRNGKey(seed = 123), (N, J))
weight = np.array([m_true, 10*m_true])
error = 1e-3 * random.normal(random.PRNGKey(234), (N, )) # standard Normal

y_obs = f_true * (X @ weight + b_true) + error
​​
# setting up model
def model(X, y=None):
ndims = np.shape(X)[-1]
ws = numpyro.sample('betas', dist.Normal(0.0,10.0*np.ones(ndims)))
b = numpyro.sample('b', dist.Normal(0.0, 10.0))
sigma = numpyro.sample('sigma', dist.Uniform(0.0, 10.0))
f = numpyro.sample('f', dist.Normal(0.0, 2.5))
mu = f * (X @ ws + b)
return numpyro.sample("y", dist.Normal(mu, sigma), obs=y)
​
# setting up the sampler
nuts_kernel = NUTS(model)
num_warmup, num_samples = 500, 1500
mcmc = MCMC(nuts_kernel, num_warmup, num_samples, num_chains=1)
​
# sampling
mcmc.run(random.PRNGKey(240), X, y = y_obs)
​
# printing the NUTS summary
print(mcmc.print_summary())``````

I should say, I am not a Numpyro dev, so treat my answers with a pinch of salt. Maybe @fehiepsi will want to clarify my mistakes or have something to add.

(2) Right. Keep X as is (N,1) (so that matrix multiplication works). I just meant that Y is (N,) (i.e) a scalar vector. Your code looks good now.

(3) Just to clarify, when I said the likelihood is not identified, I meant it in a mathematical sense (there exist multiple points in parameter space where likelihood is identical). This has nothing to do with numpyro or MCMC.

You are right, a user provides priors and in each iteration the algorithm will take draws from these priors. But in your code you also provided likelihood. The key is this statement `sample("y", dist.Normal(mu, sigma), obs=y) ` It tells the algorithm to evaluate the log-likelihood of observing `obs = y` if the latent variables were set at the current iteration’s values.

An alternative (and perhaps more explicit) way to add to log_likelihood is to use `factor`. For example, for the purpose of inference, `sample("y", dist.Normal(mu, sigma), obs=y) ` could be replaced with `factor("y", dist.Normal(mu, sigma).log_prob(y))`. This again means to create an instance of `dist.Normal` with paramaters `mu, sigma` and use it’s method `log_prob` to compute log likelihood of seeing `y`.

If you wanted you could code the log-likelihood of Normal distribution by hand and use it inside `factor`. But it is already implemented as a method of ` dist.Normal` so there is no need to do it (but in some models, it may be necessary if distributions are not yet implemented).

(6) The way you model the dependence is up to you. Suppose you have `L` measurements for each of `N` units (so `Y.shape = (N,L)`) you may then want to replace `Normal` with `MultivariateNormal` and scalar `sigma` with a covariance matrix `Sigma`. Although, make sure to review the tutorial on shapes. In this example you would want to ensure that `batch_shape` is `N` and `event_shape` is `L`.

I hope this clarifies it.

2 Likes

Thanks, @Elchorro, for your detailed answer! I just have a few points to elaborate:

• The function `model` defines a generative model for your observed data. It corresponds 1-1 to a graphical model. The likelihood function is encoded in your observed sites: the `numpyro.sample` sites with `obs`. It corresponds to the grey circle node in a graphical model.
• You can return whatever you want because `model` is just a Python function. E.g. you can `return ws, b, sigma`.
• Given a generative model p(z, x) (where z is a latent variable, x is an observed variable), HMC will generate samples of p(z|x) = p(z, x) / p(x) ~ p(z, x) = p(z)p(x|z). Here the prior p(z) is calculated from `sample` statement without `obs`. And the likelihood `p(x|z)` is calculated from `sample` statement with `obs`. In your example, the likelihood is `jnp.exp(dist.Normal(mu, sigma).log_prob(y))`.
1 Like

Thank you @Elchorro and @fehiepsi ! I am beginning to get some clarity.

@fehiepsi, while I understand that `model` is just a python function and can return anything, the reason I asked that is because `model` is used to build the NUTS kernel. Hence, I was wondering if something specific needs to be returned. Does the kernel look for all `sample` statements and decide whether the given statement is a prior or not based on the `obs` keyword?

If I want to define the likelihood function to be \$e^{-(model - data)^2/variance}\$, do I simply change the `sample` call to `numpyro.sample("y", dist.Normal(mu, 1e-6*np.ones_like(y)), obs=y)`? (In the synthetic data used for the problem, the data is taken to have a standard deviation of 1e-3 for all data points)

You are right. With `dist.Normal(mu, 1e-6*np.ones_like(y))`, the likelihood function is `exp(-(mu - y)^2 / 1e-12)`.

Great. Thanks a ton @fehiepsi and @Elchorro!

It’s very encouraging to see both developers and users being so responsive and helpful. Definitely nudges me to continue using pyro and hopefully contribute to the codebase one day.