Defining a likelihood that is a mixture of distributions

bdhammel · November 17, 2020, 6:03pm

I have a noisy physical system that I’m trying the characterize and I want to explore if MCMC bayesian inference can help

The system takes inputs x and does a dot product with w.
x and w are known, so I am not worried about inferring epistemic uncertainty.
x has noise that is additive random normal, and w has multiplicative noise that is random normal (i.e. w * N(0, var) so I need to model the heteroscedastic aleatoric uncertainty
From a correctness point of view, I realize some values will be degenerate. But I have strong priors for the real application, so I hope to regularize the model that way

I would imagine the code would look something like this:

def model(x, w, y):
    x_noise_lvl = numpyro.sample('x_noise_lvl', Normal(0, 5))
    w_noise_lvl = numpyro.sample('w_noise_lvl', Normal(0, 2))
    return numpyro.sample('obs', dist.Normal(x, x_noise_lvl).T @ dist.Normal(w, w*w_noise_lvl) , obs=Y)

However, I’m not sure how to do math with distributions in pyro. I’m also happy to use PyTorch based pyro, it doesn’t need to be numpyro.

Is this doable?

fehiepsi · November 17, 2020, 6:25pm

@bdhammel Could you write your model in some form such as:

x_noise_lvl ~ Normal(...)
w_noise_lvl ~ Normal(...)
y ~ ...

It would be easier to write a Pyro model from that. In your model, I’m not sure what dist.Normal(x, x_noise_lvl) does. Does noise here mean scale/standard deviation or an additive noise to be added to x? If the later, then I guess your model will be something like

def model(x, w, y):
    x_noise = sample('x_noise', Normal(0, 5))
    w_noise = sample('w_noise', Normal(0, 2))
    x_true = x + x_noise
    w_true = w * w_noise
    sigma = sample('sigma', Exponential(1))
    sample('obs', dist.Normal(x_true.T @ w_true, sigma), obs=y)

bdhammel · November 17, 2020, 6:51pm

Thank you for your reply, sorry for the confusion.

I would write it like so:

x_noise_lvl ~ Normal(0, 5)
w_noise_lvl ~ Normal(0, 2)
y ~ Normal(x, x_noise_lvl).T @ Normal(w, w*w_noise_lvl)

Where each value in x and each value in w is perturbed by noise.

If w had no noise, I would build the model like:

def model(x, w, y):
    # note, my `x_noise_lvl` is the `sigma` term in your example
    x_noise = sigma = sample('sigma', Normal(0, 5))
    sample('obs', dist.Normal(x.T @ w, sigma), obs=y)

In your example, the observational noise is random normal with std sigma. In my case, the observational noise is the product of two distributions, which don’t have a closed-form solution (I don’t think?). I imagine pyro has to compute the log-likelihood of y given Normal(x, x_noise_lvl) * Normal(w, w*w_noise_lvl)

**The above is my understanding of what’s written, very possible I’m confused?

fehiepsi · November 17, 2020, 7:23pm

Could you clarify what x_noise, w_noise are and how they relate to x_noise_lvl and w_noise_lvl?

bdhammel · November 17, 2020, 7:36pm

Oh I’m sorry - type-o. I meant to write x_noise_lvl. Post edited

fehiepsi · November 17, 2020, 7:52pm

So x_noise_lvl is a scale/std parameter. Thanks! I was confused because Normal(0, 5) is not a proper prior for scale/std parameter. IIUC, then you want to model something like

x_noise_lvl ~ LogNormal(0, 5)
w_noise_lvl ~ LogNormal(0, 2)
y ~ Normal(x, x_noise_lvl).T @ Normal(w, w*w_noise_lvl)

Looking like you also need to guarantee that w * w_noise_lvl is positive (it will be satisfied if your data w is positive).

I will think about the above model and will let you know if I have some ideas.

bdhammel · November 17, 2020, 7:55pm

I was confused because Normal(0, 5) is not a proper prior for scale/std parameter

Oh yes, you’re absolutely correct, LogNormal is more appropriate

bdhammel · November 17, 2020, 7:56pm

This is also a good point

fehiepsi · November 18, 2020, 3:35am

@bdhammel I think we can reformulate the model as follows

# pls correct me if those shapes are wrong
x.shape = (M, N), w.shape = (M,), y.shape = (N,)
x_noise_lvl ~ LogNormal(0, 5)
w_noise_lvl ~ LogNormal(0, 2)
w_noise ~ Normal(0, 1)
w_ = w * (1 + w_noise * w_noise_lvl)
y ~ Normal(x.T @ w_, x_noise_lvl * |w_.sum(-1)|)

You wanted to marginalize out w_noise but AFAIK, currently, we don’t have a machinery to do that. If M is small, then you can run MCMC over x_noise_lvl, w_noise_lvl, and w_noise.

I’m not sure how to do math with distributions in pyro

If you know how to compute this likelihood, then you can use pyro.factor to add your custom log probability to the model.

bdhammel · November 18, 2020, 7:49pm

Okay, Thank you!
Can you point me to any examples using pyro.factor?

fehiepsi · November 18, 2020, 9:00pm

You can find some of them at those places: epidemiological tutorial, sir example, rsa tutorial, or hmm example.