I am trying to implement a simple Bayesian variable selection algorithm using SVI. My dataset is a 1000x2 dimensional vector where the output y is generated using just one of the columns of X as X * [0, 1].
My model looks like this
def model():
N, d = X.shape
pi = pyro.sample("pi", dist.Beta(0.5, 0.5))
xi = []
w = []
for i in pyro.plate("xplate", d):
xi.append( pyro.sample(f"xi_{i}", dist.Bernoulli(probs = pi),infer={"enumerate": "sequential"}))
for i in pyro.plate("wplate", d):
w.append( pyro.sample(f"wi_{i}", dist.Normal(0., 1.)) )
xi_stacked = torch.stack(xi).view(2, -1)
w_stacked = torch.stack(w).view(2, -1)
beta_model = xi_stacked*w_stacked
y_model_mu = torch.matmul(X, beta_model)
with pyro.plate("data_loop"):
pyro.sample("obs", dist.Normal(y_model_mu,
torch.tensor([1.])), obs=y)
guide = autoguide.AutoDelta(model, poutine.block(model, hide=['xi']))
So first I draw pi from a Beta prior then I use this to parameterise a single Bernoulli distribution from which I make d independent draws, these draws are indicator variables for my regression coefficients which I sample from a normal distribution by making d independent draws from a N(0, 1). I then multiply the two to zero out one of the regression coefficients. I then use this to compute the mu of the observations which I assume is Normally distributed. I declare my loss as
But this gives me a really weird error that TypeError: model() takes 0 positional arguments but 1 was given which is very confusing as I am not passing any arguments to the function at all.
I am missing something basic but I have googled around a lot and found nothing so far would be great if I could get some pointers. Thanks a lot
hi @silverfish it looks like you may have had a couple typos there. First you need only specify model once to AutoDelta. Second, you’ll need to mask all the “xi” variables, say using a hide_fn.
Also I’d be a bit wary of trying to do variable selection this way, as the AutoDelta guide will use the same posterior for every subset of variables. @martinjankowiak has done a lot of work in variable selection and may be able to suggest other approaches. Good luck!
Hey @fritzo Thanks for your reply. I have been looking at pyro and it seems really interesting. I am still trying to wrap my head around it. Especially your papers on plated factor graph variable elimination looked really interesting although I havent understood it completely yet. I hope to learn more about the funsor library as well. I would really appreciate some links or references on the variable selection problem from @martinjankowiak if he finds some time but thanks for the lambda function formulation for hiding sites. I couldnt have figured it out on my own
def mm3(X, y):
N, dim = X.shape
probs_ = torch.tensor([.5 for i in range(dim)])
p = pyro.param("probs", probs_, constraint = constraints.unit_interval)
theta = pyro.sample("theta", dist.Normal(torch.zeros(dim), torch.ones(dim)).to_event(1))
burn = pyro.sample(f"burn", dist.Bernoulli(probs = p).to_event(1), infer= {"enumerate":"sequential"})
theta = (theta*burn).cuda()
mu = torch.matmul(X, theta).cuda()
with pyro.plate("data", N):
pyro.sample(f"obs", dist.Bernoulli(logits = mu).to_event(1), obs = y)
It seems to pick out the relevant variables but also picks up other non relevant variables. I guess this is a consequence of using a Bernoulli prior for the variable selection so roughly half the total end up getting selected. I think I can use something like a HalfCauchy.
For the Guides I definitely need to read up more on selecting suitable guides. Probably writing a custom guide is faster than using a Autoguide
Beware sequential enumeration won’t work after .to_event(1); your original model was closer to a working enumerated solution.
If you want a sparser solution, replace your probs param with a sample from a Beta distribution that favors small probs. Treating probs as a pyro.param is equivalent to sampling from a uniform Beta(1,1). Instead you might try
@silverfish can you explain what it is you want to do? bayesian variable selection is pretty tricky in general and doesn’t necessarily work well with black-box inference algorithms like variational inference.
@martinjankowiak i am looking at it from a biological data point of view mostly for multi omic data. I dont have a fixed goal in mind as its just something i decided to try to get to know pyro better and become more comfortable with probabilistic machine learning but i would like to do bvs for both regression and multi class classification problems. I want to develop my skills mostly be able to reason about different kinds of models and be able to debug them if something goes wrong. I found your papers and a couple of others and am mostly trying to implement them. If you have some recommendations that would be great. I am a bit disappointed that it doesnt work well in general
@silverfish inference in bayesian variable selection problems tends to be tricky, especially if N (the number of data points) or P (the number of covariates) is large. if they are moderate (e.g. N is no more than a few thousand and P is no more than a few hundred) then using HMC on a regression model governed with a horseshoe prior (example here) is one viable strategy. however as you start to enter more complex/difficult regimes (e.g. multi-class classification with many classes) more likely than not one will need a custom inference strategy to get good performance. depending on the precise situation, this can be pretty complex and will depend on the particular details of the model (e.g. classification versus regression). so it’s hard to give general tips about what’s best to do. if you have a specific situation in mind (a particular dataset with a particular N and P) then it would be easier to make some suggestions.
I was also wondering about Bayesian Variable Selection for -omics data. Often times deal with p of 50K+ and N in the thousands.
I have found that in this situation most Bayesian methods are extremely slow. Even a hierarchical univariate model where y=b0+B1 * X (where * is element wise and B1 is 1 x p) is fit to all the features but B1 is pooled is slow.
Are there any pyro examples on omics data like this?