Implementing Bayesian Variable Selection with pyro

silverfish · August 24, 2021, 2:30pm

I am trying to implement a simple Bayesian variable selection algorithm using SVI. My dataset is a 1000x2 dimensional vector where the output y is generated using just one of the columns of X as X * [0, 1].

My model looks like this

def model():
    N, d = X.shape
    pi = pyro.sample("pi", dist.Beta(0.5, 0.5))
    xi = []
    w = []
    for i in pyro.plate("xplate", d):
        xi.append( pyro.sample(f"xi_{i}", dist.Bernoulli(probs = pi),infer={"enumerate": "sequential"}))
    for i in  pyro.plate("wplate", d):
        w.append( pyro.sample(f"wi_{i}", dist.Normal(0., 1.)) )


    xi_stacked = torch.stack(xi).view(2, -1)
    w_stacked = torch.stack(w).view(2, -1)
    beta_model = xi_stacked*w_stacked
    y_model_mu  = torch.matmul(X, beta_model)
    with pyro.plate("data_loop"):
          pyro.sample("obs", dist.Normal(y_model_mu, 
          torch.tensor([1.])), obs=y)
          


   guide = autoguide.AutoDelta(model, poutine.block(model, hide=['xi']))

So first I draw pi from a Beta prior then I use this to parameterise a single Bernoulli distribution from which I make d independent draws, these draws are indicator variables for my regression coefficients which I sample from a normal distribution by making d independent draws from a N(0, 1). I then multiply the two to zero out one of the regression coefficients. I then use this to compute the mu of the observations which I assume is Normally distributed. I declare my loss as

elbo = TraceEnum_ELBO(max_plate_nesting=1)
elbo.loss(model, guide)

But this gives me a really weird error that TypeError: model() takes 0 positional arguments but 1 was given which is very confusing as I am not passing any arguments to the function at all.

I am missing something basic but I have googled around a lot and found nothing so far would be great if I could get some pointers. Thanks a lot

silverfish · August 24, 2021, 2:58pm

I managed to lose the error there is an issue with the guide statement. I shouldnt be passing model to the AutoDelta model only poutine

fritzo · August 29, 2021, 9:46pm

hi @silverfish it looks like you may have had a couple typos there. First you need only specify model once to AutoDelta. Second, you’ll need to mask all the “xi” variables, say using a hide_fn.

- guide = autoguide.AutoDelta(model, poutine.block(model, hide=['xi']))
+ guide = autoguide.AutoDelta(
+     poutine.block(
+         model,
+         hide_fn=lambda site: "xi" in site["name"]),
+ )

Also I’d be a bit wary of trying to do variable selection this way, as the AutoDelta guide will use the same posterior for every subset of variables. @martinjankowiak has done a lot of work in variable selection and may be able to suggest other approaches. Good luck!

silverfish · August 30, 2021, 8:26am

Hey @fritzo Thanks for your reply. I have been looking at pyro and it seems really interesting. I am still trying to wrap my head around it. Especially your papers on plated factor graph variable elimination looked really interesting although I havent understood it completely yet. I hope to learn more about the funsor library as well. I would really appreciate some links or references on the variable selection problem from @martinjankowiak if he finds some time but thanks for the lambda function formulation for hiding sites. I couldnt have figured it out on my own

silverfish · August 30, 2021, 8:30am

I managed to write another model for this

def mm3(X, y):


    N, dim = X.shape

    probs_ = torch.tensor([.5 for i in range(dim)])
    p = pyro.param("probs", probs_, constraint = constraints.unit_interval)  

    theta = pyro.sample("theta", dist.Normal(torch.zeros(dim), torch.ones(dim)).to_event(1)) 
    burn = pyro.sample(f"burn", dist.Bernoulli(probs = p).to_event(1), infer= {"enumerate":"sequential"})


    theta = (theta*burn).cuda()

    mu = torch.matmul(X, theta).cuda()

    with pyro.plate("data", N): 
        pyro.sample(f"obs", dist.Bernoulli(logits = mu).to_event(1), obs = y)

It seems to pick out the relevant variables but also picks up other non relevant variables. I guess this is a consequence of using a Bernoulli prior for the variable selection so roughly half the total end up getting selected. I think I can use something like a HalfCauchy.

For the Guides I definitely need to read up more on selecting suitable guides. Probably writing a custom guide is faster than using a Autoguide

fritzo · August 30, 2021, 10:33am

dist.Bernoulli(probs = p).to_event(1)

Beware sequential enumeration won’t work after .to_event(1); your original model was closer to a working enumerated solution.

If you want a sparser solution, replace your probs param with a sample from a Beta distribution that favors small probs. Treating probs as a pyro.param is equivalent to sampling from a uniform Beta(1,1). Instead you might try

probs = pyro.sample("probs", Beta(0.5, 2).expand([dim]))

martinjankowiak · August 30, 2021, 8:31pm

@silverfish can you explain what it is you want to do? bayesian variable selection is pretty tricky in general and doesn’t necessarily work well with black-box inference algorithms like variational inference.

silverfish · August 31, 2021, 8:57pm

@martinjankowiak i am looking at it from a biological data point of view mostly for multi omic data. I dont have a fixed goal in mind as its just something i decided to try to get to know pyro better and become more comfortable with probabilistic machine learning but i would like to do bvs for both regression and multi class classification problems. I want to develop my skills mostly be able to reason about different kinds of models and be able to debug them if something goes wrong. I found your papers and a couple of others and am mostly trying to implement them. If you have some recommendations that would be great. I am a bit disappointed that it doesnt work well in general

martinjankowiak · September 1, 2021, 3:00pm

@silverfish inference in bayesian variable selection problems tends to be tricky, especially if N (the number of data points) or P (the number of covariates) is large. if they are moderate (e.g. N is no more than a few thousand and P is no more than a few hundred) then using HMC on a regression model governed with a horseshoe prior (example here) is one viable strategy. however as you start to enter more complex/difficult regimes (e.g. multi-class classification with many classes) more likely than not one will need a custom inference strategy to get good performance. depending on the precise situation, this can be pretty complex and will depend on the particular details of the model (e.g. classification versus regression). so it’s hard to give general tips about what’s best to do. if you have a specific situation in mind (a particular dataset with a particular N and P) then it would be easier to make some suggestions.

Vattaka · September 3, 2021, 4:46pm

I was also wondering about Bayesian Variable Selection for -omics data. Often times deal with p of 50K+ and N in the thousands.

I have found that in this situation most Bayesian methods are extremely slow. Even a hierarchical univariate model where y=b0+B1 * X (where * is element wise and B1 is 1 x p) is fit to all the features but B1 is pooled is slow.

Are there any pyro examples on omics data like this?

martinjankowiak · September 4, 2021, 1:52pm

@Vattaka there are no examples on omics data that i’m aware of.

besides N and P another factor that can play a large role in the difficulty of the problem is the form of the likelihood.

this approach should be reasonably fast in the regime you describe, at least on gpu. note this only works for real scalar responses.