Pyro seems to be a phenomenal tool, and I’ve been desperate for an opportunity to use it.

Unabashed greenhorn (and frequentist by habit) that I am, though, I have great trouble even getting off the ground.

I am wondering if anyone has a toy example of a Bayesian logistic regression (similar to the Bayesian linear regression tutorial) that they’d be willing to share by way of instruction!

logistic regression should be a straightforward extension of the linear regression. instead of observing against a gaussian, you now observe against a bernoulli.

I was also trying to implement a toy logistic regression before building something more complex. I found it somewhat hard to follow the existing regression tutorial because of the way the code is structured.

I prefer the structure of the VAE tutorial, in which the model and guide are contained within a reusable module and aren’t relying on any global variables located elsewhere in the notebook.

It took a lot of debugging and messing around with dimensions (I still don’t really grok how Pyro does this but I’m slowly learning), but I have a working example here:

Phenomenal. Thanks so much! Straightforward for most people is most certainly not straightforward for an amateur like me. I’m wondering how easy it would be to transform this into a softmax regression with torch.nn.softmax as the final layer of the model, and what I suppose would be just gaussians over the regression parameters?

Colin asked me a question that raised something else I was wondering: if I wanted to use nn.Linear instead of explicitly creating weight and bias tensors, what is the idiomatic way to introduce priors to the weights of that module?

It seems like the answer is pyro.random_module but I’m confused about where it needs to be called. In __init__? Once in model and then again in guide?

It seems like the answer is pyro.random_module but I’m confused about where it needs to be called

please refer to the docs on using random_module. the tutorial shows the usage. in your example, you would establish the architecture in __init__, then in your model and guide, everywhere you have a sample(), you instantiate that distribution and throw them into a dict. the torch.matmul is now running the nn forward so you’d want:

dict of priors = {name_of_param: prior_dist, ...}
nn_dist = random_module(self, dict_of_priors)
nn = nn_dist() #sample a nn
model_logits = nn(data)
...

I’m attempting to combine the successful elements of James’ code with the base tutorial: even though my suspicion is that the model itself needs a nonlinear nn.Sigmoid layer, I’ve left that out for now. Here’s how I’ve specified the model call (I’ve left the guide the same as in the tutorial) and the data generator (straight from James’ code):

def build_linear_dataset(N, p=1, noise_std=0.01):
X = np.random.rand(N, p)
# w = 3
w = 3 * np.ones(p)
# b = 1
y = np.matmul(X, w) + np.repeat(1, N) + np.random.normal(0, noise_std, size=N)
y = y.reshape(N, 1)
X, y = torch.tensor(X).type(torch.Tensor), torch.tensor(y).type(torch.Tensor)
data = torch.cat((X, y), 1)
assert data.shape == (N, p + 1)
return data
def model(data):
# Create unit normal priors over the parameters
loc, scale = torch.zeros(1, 1), 10 * torch.ones(1, 1)
bias_loc, bias_scale = torch.zeros(1), 10 * torch.ones(1)
w_prior = Normal(loc, scale).independent(1)
b_prior = Normal(bias_loc, bias_scale).independent(1)
priors = {'linear.weight': w_prior, 'linear.bias': b_prior}
lifted_module = pyro.random_module("module", regression_model, priors)
lifted_reg_model = lifted_module()
with pyro.iarange("map", N):
x_data = data[:, :-1]
y_data = data[:, -1]
model_logits = lifted_reg_model(x_data)
#model_logits = (torch.matmul(x, weights_prior.permute(1, 0)) + bias_prior).squeeze()
pyro.sample("obs", Bernoulli(logits=model_logits, validate_args=True), obs=y_data.squeeze())

Here’s the error I’m getting when I attempt to run the SVI:

Also I want to clarify: this is a logistic regression model, because you’re passing the output from the linear model as logits rather than as probs. The sigmoid layer is implicit there in the usage of the output of the linear layer as the logit of the Bernoulli distribution.

I definitely think it should be part of our example since we assume no probabilitic programming experience we could have people who wouldnt find it intuitive or the examples could make things easier for anybody.