What's the difference (to a MCMC sampler) between `numpyro.sample()` and `numpyro.factor()`?

dman · November 18, 2022, 11:54am

Hi folks – loving numpyro so far and the flexibility it offers with the various samplers. I came over from Stan, and I needed an alternative way of sampling from a combination of discrete and continuous probability spaces.

I’ve been working with a probability function which involves a copula factor which (as far as know isn’t an inbuilt distribution in NumPyro, and following a post in the forum I was able to add this component to the model function using numpyro.factor.

My general question is – what is formally the output of a model function you provide to a kernel/MCMC sampler? Is it simply a prior/posterior (depending whether you pass in data or not) which is sampled from by the method of your choice?

The reason I ask is due to a bit of confusion between the specification of parameters in the joint. In the numpyro.sample() docs, it simply claims that it Returns a random sample from the stochastic function fn. To me this is not necessarily the same thing as adding a component to the log likelihood, as it claims to return a random MC sample from the distribution, not necessarily add it to a total log likelihood across the model function. The numpyro.factor() statement on the other hand appears to address this query directly to add arbitrary log probability factor to a probabilistic model. This appears to serve what I was interested in, I don’t think you can “instantialise” parameters in Numpyro without using a sample() or deterministic() command (unlike in Stan where you do so in the parameters block).

So to summarise: what does the sample function do within a model function? Does it simply instantiate a parameter and allow you to add a prior to the posterior you’ll sample with, or does it take draws from a MC sample directly rather than MCMC (in Stan terminology, the difference between a normal() vs normal_rng() function). If the former, then what is the difference between sample() and factor() apart from the fact that sample allows you to use pre-built distributions from Numpyro? Is there a way to, for example, add a custom term to the posterior without initialising parameters in the sample or deterministic manner? Or would I need to initialise the parameters with unbounded uniform priors using sample before adding them to the factor() command?

Elchorro · November 18, 2022, 3:57pm

Hi @dman!

I made the Stan → Numpyro transition long time ago, so I don’t remember all the mechanics but let me explain (to the best of my understanding) how NumPyro works.

numpyro.sample statement does three things. First, it defines parameter to be sampled (like in Stan’s parameters block). Second, it samples from the chosen prior distribution. And, third, it evaluates (and increments) the log-likelihood of the sampled value.

Meanwhile, numpyro.factor only perform the last function: it adds arbitrary value to the log-likelihood. A typical use case (for me) is when I need to evaluate (and increment) log-likelihood of observed data conditional on the sampled parameters, in cases where the distribution of data is not implemented.

If the distribuition from which the data is drawn is implemented, then you can use numpyro.sample and specify obs=. In this case the behavior of numpyro.sample is very similar to numpyro.factor, except that you don’t have to compute the log-likelihood by hand (as in factor), the program will do it for you.

Regarding initialization, you can initialize the parameters defined within numpyro.sample statements using init parameter of your MCMC kernel.

Hope this helps!

fehiepsi · November 19, 2022, 8:14am

Thanks @Elchorro! Adding some other pointers for @dman

Regarding log joint computation, having

x = sample("x", dist.LogNormal(0, 1))

would be the same as

x = sample("x", dist.ImproperUniform(dist.constraint.positive, (), ()))
factor("x_log_prob", dist.Normal(0, 1).log_prob(x))

The later is useful when you want to customize how log prob is computed. Sometime, it’s tricky to define a Distribution to be used in the first pattern.

You are right that sample method will just return the sample for you. Essentially it creates a special message that returns a sample when processed. But in inference algorithms, we use effect handlers to record the values and the corresponding distributions of the messages, hence we can compute log densities of those variables.

dman · November 20, 2022, 12:27pm

Thanks so much for the detailed answer @Elchorro and @fehiepsi! This is exacrtly what I was looking for. And the dist.ImproperUniform() is a hidden gem (In the Stan framework, this would simply correspond to initialising a parameter without any prior which as you say is ideal when working with more complex models).