Model inputs as parameters of a distribution

majidmohammadi · September 14, 2021, 2:18pm

I have a problem, where the inputs to the Bayesian model are the parameters of the distributions, so they need to be sampled first. I know that it is confusing, so I am explaining this by making a synthesized example of regression.

For the typical regression, where X and y are the predictors and the predicted, respectively, we compute the coefficient w and the bias b by the following model (only the essence of the model is presented):

numpyro.sample('y', dist.Normal(X*w+b,sigma), obs=y)

Now let’s assume that the response variable y is not directly observed, but we know that it follows a distribution with specified parameters. As a simple case, assume that we know that each y_i follows a uniform distribution with two parameters a and b.
So, what is required here (as far as I could grasp) is to first sample from a uniform distribution (in each step of MCMC) and then apply the above sampling, which means we can have a model like:

y_hat = numpyro.sample('y_hat', dist.Uniform(a,b))
numpyro.sample('y', dist.Normal(X*w+b,sigma), obs=y_hat)

Does anyone know how it is possible with Numpyro (i.e., passing a stochastic parameter as an observation)? I implemented as such but did not get the result expected!

Any hint is highly appreciated!

fritzo · September 19, 2021, 11:54pm

Not sure if I understand correctly, but it sounds like you’ll want to synthesize a dataset before inference, then use that fixed random dataset throughout inference. If that doesn’t sound right, then it would help to know a little more about your problem.

majidmohammadi · September 22, 2021, 11:26am

@fritzo Thanks for the response. I guess we can see it as a synthesize of a dataset as well.

My real problem includes dealing with users, and the experts specify some inputs for the model/inference. When their inputs are, for example, real numbers, then they can be simply modelled. However, sometimes their inputs are not straightforward, for example, they give their inputs in form of intervals or even a triangular distribution. This is indeed the challenge I am facing now. And, I guess it is realistic to see it as synthesizing a dataset.

Does the explanation clarify a bit?

fritzo · September 23, 2021, 1:53pm

Ah thanks for explaining. I think this is a classic Bayesian problem of specifying priors: your users have prior information and you can include this in your model via pyro.sample() statements. For example suppose you have a model with dependency structure x --> y

def model():
    x = pyro.sample("x", dist.Normal(0, 1))
    y = pyro.sample("y", dist.Normal(x, 1))

If a user knows the value of “y” you could extend this to

def model_2(observed_y):
    x = pyro.sample("x", dist.Normal(0, 1))
    y = pyro.sample("y", dist.Normal(x, 1), obs=observed_y)

but if a users has only weaker information, say in the form of an interval (lb, ub), you could instead add a sample statement

def model_3(lb, ub):
    x = pyro.sample("x", dist.Normal(0, 1))
    y = pyro.sample("y", dist.Normal(x, 1))
    pyro.sample("y_bounds", dist.Uniform(lb, ub), obs=y)

But inference in that model will be hard because of infeasible points; to make that model more tractable, I’d rewrite in a form that works with SVI and MCMC:

def model_3_tractable(lb, ub):
    x = pyro.sample("x", dist.Normal(0, 1))
    y = pyro.sample("y_bounds", dist.Uniform(lb, ub))
    pyro.sample("y", dist.Normal(x, 1), obs=y)

Similarly if a user’s prior knowledge is in the form of a triangular distribution, you could replace Unifrom with Triangular (but actually Triangular is not yet implemented; PRs welcome, here’s a start).

On the topic of user-specified distributions, have you heard the fascinating story of J.P. Craven’s search for the sunken submarine USS Scorpion?

majidmohammadi · September 24, 2021, 2:23pm

@fritzo Thanks a lot for the solution. I also mentioned more or less the same approach in my original question

However, this does not work as expected (i.e., the outcome of the model is not as expected on a controlled experiment).

However, according to what you said a stochastic variable like y can act as an observed variable on another, as you mentioned it here:

Does it make sense?

For the triangular distribution, one workaround is to write it as the difference of two uniform distribution, though not sure if applicable here!

fritzo · September 24, 2021, 7:35pm

Yes, you can add the result of one pyro.sample statement as the obs= in another pyro.sample statement. Such models are not strictly generative (you can’t sample from the prior). But this is a common trick to add a custom likelihood to a model.

majidmohammadi · September 26, 2021, 10:26am

@fritzo Thanks a lot, that works pretty well as expected!

Another related question for prior uniform distributions: do the posterior samples return the most probable values from the initial intervals or do they span the whole initial intervals uniformly (as specified)?
Just to give more clarification, we specify a bound for each number by using uniform distribution and then the posterior samples are drawn. So, now that we have the sample from posterior samples, I do not if we should expect them to hold the prior uniform distributions for those parameters, or otherwise, they do return the most probable values as posterior.

fritzo · September 27, 2021, 11:43am

posterior distributions generally differ from and are narrower than prior distributions

majidmohammadi · October 11, 2021, 12:41pm

@fritzo If I may, I’d like to ask a follow-up question.

I would like to use the categorical distribution as the prior (instead of uniform) for which I get a 3-d matrix as input. The 3-d variable, shown by x, is of size (d1, d2, d3) and we would like to realize a categorical sample on the third dimension and create a 2d array of size (d1,d2). Here is a simplified version of the full model I developed accordingly:

def BayesianModel(x):
    
    d1, d2, d3 = x.shape 
       
    w_star = numpyro.sample("w_star", dist.Dirichlet(0.001*np.ones(d2)))
    gamma_star = numpyro.sample("gamma_star", dist.Gamma(0.01, 0.01))
    
    
    a_w = numpyro.sample('aws', dist.Categorical(x))  #a_w should be of size (d1,d2)

    with numpyro.plate("d1", d1, dim=-1):
        w = numpyro.sample("w", dist.Dirichlet(gamma_star*w_star))
        numpyro.sample('a_w',dist.Multinomial(probs=w), obs=a_w)

But I get the following error:

AssertionError: Missing plate statement for batch dimensions at site aws

Then I put the sampling statement inside the plate:

....
with numpyro.plate("d1", d1, dim=-1):
        a_w = numpyro.sample('aws', dist.Categorical(x))
...

ValueError: Incompatible shapes for broadcasting: ((1, d2), (d2, d1))

I also tried to put a_w in another plate statement as follows:

with numpyro.plate("ds", d1, dim=-2):
        a_w = numpyro.sample('aws', dist.Categorical(x))

with numpyro.plate("d1", d1, dim=-1):
...

But got the following error (apparently cannot have two plates on the same variable):

KeyError: 'd1'

Do you have any idea how it can be solved? Thanks in advance!

fritzo · October 11, 2021, 3:21pm

Can you indicate at which line the broadcasting error is triggered (ideally which expression within that line) and print tensor shapes? Also you should see a shapes printout on error.

@fehiepsi Does NumPyro use format_shapes() to help users diagnose shape errors?

majidmohammadi · October 11, 2021, 4:09pm

The broadcasting error arises when a_w is inside the plate:

I placed a print command right after sampling a_w, but it did not appear in the terminal, so I guess the same line is the reason for the error!

@fritzo I edited the previous topic and added some more lines about the errors! Thanks for taking the time

fehiepsi · October 12, 2021, 12:45am

@majidmohammadi There are two plates at site aws, you should declare them. It is best to use numpyro.render_model to visualize your model in picture and see if it is the model that you need. You can also use format_shapes utility to print out shapes at each site.

@fritzo Unfortunately for now users need to use that utility manually.