Please link or paste relevant code, and steps to reproduce.
I am attempting to build off the working memory tutorial, and my goal is to try this analysis with a logistic regression model with more features (i.e. more than just 1 as in the tutorial). I’m having a hard time understanding the model as implemented:
def model(l):
# Dimension -1 of `l` represents the number of rounds
# Other dimensions are batch dimensions: we indicate this with a plate_stack
with pyro.plate_stack("plate", l.shape[:-1]):
theta = pyro.sample("theta", dist.Normal(prior_mean, prior_sd))
# Share theta across the number of rounds of the experiment
# This represents repeatedly testing the same participant
theta = theta.unsqueeze(-1)
# This define a *logistic regression* model for y
logit_p = sensitivity * (theta - l)
# The event shape represents responses from the same participant
y = pyro.sample("y", dist.Bernoulli(logits=logit_p).to_event(1))
return y
Specifically, I’m confused about logit_p = sensitivity * (theta - l), since I’ve always seen the log odds written as logit_p = beta_0 + beta_1 * x_1 + beta_2 * x_2 + ... beta_d * x_d. I tried to implement the model using that approach, like so
def model(l):
# Dimension -1 of `l` represents the number of rounds
# Other dimensions are batch dimensions: we indicate this with a plate_stack
with pyro.plate_stack("plate", l.shape[:-1]):
b_0 = pyro.sample("b_0", dist.Normal(prior_mean, prior_sd)).unsqueeze(-1)
b_1 = pyro.sample("b_1", dist.Normal(1, 1)).unsqueeze(-1)
logit_p = b_0 + b_1 * l
y = pyro.sample("y", dist.Bernoulli(logits=logit_p).to_event(1))
return y
but I cannot replicate the results from the tutorial. Any thoughts on what I’m doing wrong here?
I tried using your logistic model instead of the original one. Could it be that you have to use N(-1,1) as a prior for b_1, and not N(1,1)? Because the data generating process assumes that the effect of l on p is negative (the longer the sequence, the harder it is to remember). I tried running the notebook with this model and guide, and got OK estimates.
prior_mean0 = torch.tensor(7.0)
prior_sd0 = torch.tensor(2.0)
prior_mean1 = torch.tensor(-1.0) ## PRIOR MEAN IS -1 INSTEAD OF 1
prior_sd1 = torch.tensor(1.0)
def model(l):
# Dimension -1 of `l` represents the number of rounds
# Other dimensions are batch dimensions: we indicate this with a plate_stack
with pyro.plate_stack("plate", l.shape[:-1]):
theta0 = pyro.sample("theta0", dist.Normal(prior_mean0, prior_sd0)).unsqueeze(-1)
theta1 = pyro.sample("theta1", dist.Normal(prior_mean1, prior_sd1)).unsqueeze(-1)
# Share theta across the number of rounds of the experiment
# This represents repeatedly testing the same participant
# This define a *logistic regression* model for y
logit_p = theta0 + theta1*l
# The event shape represents responses from the same participant
y = pyro.sample("y", dist.Bernoulli(logits=logit_p).to_event(1))
return y
def guide(l):
# The guide is initialised at the prior
posterior_mean0 = pyro.param("posterior_mean0", prior_mean0.clone())
posterior_sd0 = pyro.param("posterior_sd0", prior_sd0.clone(), constraint=positive)
pyro.sample("theta0", dist.Normal(posterior_mean0, posterior_sd0))
posterior_mean1 = pyro.param("posterior_mean1", prior_mean1.clone())
posterior_sd1 = pyro.param("posterior_sd1", prior_sd1.clone(), constraint=positive)
pyro.sample("theta1", dist.Normal(posterior_mean1, posterior_sd1))
Hi chvandorp, thank you very much for the response! After posting this yesterday I also realized that the issue was in fact the priors. The logit_p in the tutorial is equivalent to beta_0 + beta_1 * l where beta_1 = -1.
Hi, original author of the working memory tutorial here. You are completely correct that the model in the tutorial is equivalent to the conventional form with beta_1=-1. This incorporates the assumption that longer sequences are harder to recall. We also incorporated a sensitivity parameter which we did not learn. If sensitivity became a regular pyro latent variable, then this would be equivalent to a reparametrised version of the conventional logistic regression model in which beta_1=-sensitivity and beta_0= sensitivity*l. My reason for parameterizing the model in this way was to make theta have a real world interpretation: the length at which the participant has a 50-50 chance of correctly recalling the sequence.
Hey, thank you so much for your message! Re: the tutorial, that totally makes sense to only learn one latent variable - having played around with adding more variables, it quickly becomes challenging to interpret how model behavior changes…
The EIG package is great and I’m excited to continue using it!