Understanding working memory tutorial

dky · October 5, 2022, 9:06pm

What tutorial are you running?

Designing Adaptive Experiments to Study Working Memory

What version of Pyro are you using?
1.8.2
Please link or paste relevant code, and steps to reproduce.

I am attempting to build off the working memory tutorial, and my goal is to try this analysis with a logistic regression model with more features (i.e. more than just 1 as in the tutorial). I’m having a hard time understanding the model as implemented:

def model(l):
    # Dimension -1 of `l` represents the number of rounds
    # Other dimensions are batch dimensions: we indicate this with a plate_stack
    with pyro.plate_stack("plate", l.shape[:-1]):
        theta = pyro.sample("theta", dist.Normal(prior_mean, prior_sd))
        # Share theta across the number of rounds of the experiment
        # This represents repeatedly testing the same participant
        theta = theta.unsqueeze(-1)
        # This define a *logistic regression* model for y
        logit_p = sensitivity * (theta - l)
        # The event shape represents responses from the same participant
        y = pyro.sample("y", dist.Bernoulli(logits=logit_p).to_event(1))
        return y

Specifically, I’m confused about logit_p = sensitivity * (theta - l), since I’ve always seen the log odds written as logit_p = beta_0 + beta_1 * x_1 + beta_2 * x_2 + ... beta_d * x_d. I tried to implement the model using that approach, like so

def model(l):
    # Dimension -1 of `l` represents the number of rounds
    # Other dimensions are batch dimensions: we indicate this with a plate_stack
    with pyro.plate_stack("plate", l.shape[:-1]):
        b_0 = pyro.sample("b_0", dist.Normal(prior_mean, prior_sd)).unsqueeze(-1)
        b_1 = pyro.sample("b_1", dist.Normal(1, 1)).unsqueeze(-1)
        
        logit_p = b_0 + b_1 * l
        y = pyro.sample("y", dist.Bernoulli(logits=logit_p).to_event(1))
        return y

but I cannot replicate the results from the tutorial. Any thoughts on what I’m doing wrong here?

chvandorp · October 6, 2022, 9:13pm

Hi dky,

I tried using your logistic model instead of the original one. Could it be that you have to use N(-1,1) as a prior for b_1, and not N(1,1)? Because the data generating process assumes that the effect of l on p is negative (the longer the sequence, the harder it is to remember). I tried running the notebook with this model and guide, and got OK estimates.

prior_mean0 = torch.tensor(7.0)
prior_sd0 = torch.tensor(2.0)

prior_mean1 = torch.tensor(-1.0) ## PRIOR MEAN IS -1 INSTEAD OF 1
prior_sd1 = torch.tensor(1.0)
    
def model(l):
    # Dimension -1 of `l` represents the number of rounds
    # Other dimensions are batch dimensions: we indicate this with a plate_stack
    with pyro.plate_stack("plate", l.shape[:-1]):
        theta0 = pyro.sample("theta0", dist.Normal(prior_mean0, prior_sd0)).unsqueeze(-1)
        theta1 = pyro.sample("theta1", dist.Normal(prior_mean1, prior_sd1)).unsqueeze(-1)
        # Share theta across the number of rounds of the experiment
        # This represents repeatedly testing the same participant
        # This define a *logistic regression* model for y
        logit_p = theta0 + theta1*l
        # The event shape represents responses from the same participant
        y = pyro.sample("y", dist.Bernoulli(logits=logit_p).to_event(1))
        return y
        
def guide(l):
    # The guide is initialised at the prior
    posterior_mean0 = pyro.param("posterior_mean0", prior_mean0.clone())
    posterior_sd0 = pyro.param("posterior_sd0", prior_sd0.clone(), constraint=positive)
    
    pyro.sample("theta0", dist.Normal(posterior_mean0, posterior_sd0))
    
    posterior_mean1 = pyro.param("posterior_mean1", prior_mean1.clone())
    posterior_sd1 = pyro.param("posterior_sd1", prior_sd1.clone(), constraint=positive)
    
    pyro.sample("theta1", dist.Normal(posterior_mean1, posterior_sd1))

Hope this helps!

dky · October 6, 2022, 9:58pm

Hi chvandorp, thank you very much for the response! After posting this yesterday I also realized that the issue was in fact the priors. The logit_p in the tutorial is equivalent to beta_0 + beta_1 * l where beta_1 = -1.

ae-foster · October 14, 2022, 1:29pm

Hi, original author of the working memory tutorial here. You are completely correct that the model in the tutorial is equivalent to the conventional form with beta_1=-1. This incorporates the assumption that longer sequences are harder to recall. We also incorporated a sensitivity parameter which we did not learn. If sensitivity became a regular pyro latent variable, then this would be equivalent to a reparametrised version of the conventional logistic regression model in which beta_1=-sensitivity and beta_0= sensitivity*l. My reason for parameterizing the model in this way was to make theta have a real world interpretation: the length at which the participant has a 50-50 chance of correctly recalling the sequence.

dky · October 16, 2022, 10:49pm

Hey, thank you so much for your message! Re: the tutorial, that totally makes sense to only learn one latent variable - having played around with adding more variables, it quickly becomes challenging to interpret how model behavior changes…

The EIG package is great and I’m excited to continue using it!