Unable implement a Cognitive Model - DINA in Pyro

Hello All,

I am just beginning to learn probabilistic programming and I am trying to implement a Cognitive Diagnostic Model named DINA. My implementation is inspired by the dina package in R language: dina R. link to paper

I am able to successfully implement the code in PyMC3 (code below). But my almost identical code in Pyro keeps failing. Please advice what may I be missing.

Problem statement
My data contains ‘i’ students responding to ‘j’ questions in an exam. Each jth question is annotated with ‘k’ possible skills required to answer the question. Given each correct answer, we need to estimate which of the ‘k’ skills does the student’s posses. Also, we need to estimate item’s guess and slip parameter based on how difficult it is to answer a question.

Datasets:
Input:

  • X: (i,j) contains 0 or 1 given student got the answer right or wrong

#Dummy X matrix
X = np.random.randint(0,2,size = (i,j))

  • Q: (j,k) contains 0 or 1 given whether jth item requires kth skill to get the answer right.

#Dummy Q matrix
Q = np.random.randint(0,2,size = (j,k))

Infer variables:

  • Alpha: (i,k) should contain 0 or 1 given whether the ith student possess kth skill based on X and Q matrix.
  • Guess: (j,) question’s guess parameter
  • Slip: (j,) question’s slip parameter

Inference Algo
MCMC - NUTS

PyMC3 code (working):

alpha_prior = np.ones((i,k))/k
alpha = pm.Bernoulli("alpha", p=alpha_prior, shape = (i,k))
guess = pm.Beta("guess",alpha = 1.0, beta = 1, shape = j)
slip = pm.Beta("slip",alpha = 1.0, beta = 1, shape = j)

# eta respresents whether student possess the skills required to answer the question
t1 = T.dot(alpha, Q.T)
skills_and = T.sum(Q, axis = 1)

eta = T.cast(T.eq(t1, skills_and), dtype = 'int32')
    
# prob_of_X is calculated based on guess and slip parameter. The idea is that even though student possess all the skills, they can get the answer wrong (slip) and vice-verse for guess parameter.

a = T.pow(guess, (1-eta))
b = T.pow((1-slip), eta)

prob_of_X = T.mul(a,b)
    
generated_x = pm.Bernoulli("generated_x", prob_of_X, shape = (i,j), observed = X) 

dina_trace = pm.sample(100, tune=10)

Output

Only 100 samples in chain.
Sequential sampling (2 chains in 1 job)
CompoundStep
BinaryGibbsMetropolis: [alpha]
NUTS: [slip, guess]
100%|██████████| 110/110 [00:26<00:00, 4.17it/s]
100%|██████████| 110/110 [00:25<00:00, 4.27it/s]

Pyro code (giving error):

    alpha_prior = torch.ones(k)/k
    alpha = pyro.sample("alpha", dist.Bernoulli(probs = alpha_prior).expand([i,k]))
    guess = pyro.sample("guess", dist.Beta(1,1).expand([j])) 
    slip = pyro.sample("slip", dist.Beta(1,1).expand([j])) 
                        
    t1 = torch.matmul(alpha, torch.transpose(self.Q, 0, 1))
    skills_and = torch.sum(Q, dim = 1)
    
    eta = torch.eq(t1, skills_and).long()

    a = torch.pow(guess, (1-eta)) 
    b = torch.pow((1-slip), eta) 

    prob_of_X = torch.mul(a, b) 
            
    generated_X = pyro.sample("generated_X", dist.Bernoulli(prob_of_X).expand([i,j]), obs = self.X)

#dina.model is the function under which above code is written
MCMC(NUTS(dina.model), num_samples=100,
warmup_steps=10,
num_chains=1).run()

Error

in model(self)
16 slip = pyro.sample(“slip”, dist.Beta(1,1).expand([j]))
17
—> 18 t1 = torch.matmul(alpha, torch.transpose(self.Q, 0, 1))
19 skills_and = torch.sum(Q, dim = 1)
20

RuntimeError: mat1 and mat2 shapes cannot be multiplied (1x2 and 4x7)
Trace Shapes:
Param Sites:
Sample Sites:
alpha dist 10 4 |
value 2 |
guess dist 7 |
value 7 |
slip dist 7 |
value 7 |

I tried to fix this error using below code, but still getting error. It seems the code executes correctly first time and then gives error.

    item_plate = pyro.plate("item", j, dim = -1)
    student_plate = pyro.plate("students", i, dim = -2)
    
    alpha_prior = torch.ones(k)/k
    
    with student_plate:
        alpha = pyro.sample("alpha", dist.Bernoulli(alpha_prior).to_event(1))
        
    with item_plate:
        guess = pyro.sample("guess", dist.Beta(1,1)) 
        slip = pyro.sample("slip", dist.Beta(1,1)) 
                
    t1 = torch.matmul(alpha, Q_) 
    skills_and = torch.sum(dina.Q, dim = 1) 
    
    eta = torch.eq(t1, skills_and).long()

    a = torch.pow(guess, (1-eta)) 
    b = torch.pow((1-slip), eta) 

    prob_of_X = torch.mul(a, b) 
    
    with student_plate, item_plate:
        generated_X = pyro.sample("generated_X", dist.Bernoulli(prob_of_X)) #[i,j]

Output: (I removed the print statements from above code to make it more readable in this post. However, below output contains output from print statement)

Warmup:   0%|          | 0/2 [00:00, ?it/s]
skills_and  torch.Size([7])
Alpha  torch.Size([10, 1, 4])
guess  torch.Size([7])
slip  torch.Size([7])
t1  torch.Size([10, 1, 7])
eta  torch.Size([10, 1, 7])
prob_of_X  torch.Size([10, 1, 7])
generated_X  torch.Size([10, 10, 7])
skills_and  torch.Size([7])

~/.conda/envs/ds/lib/python3.9/site-packages/torch/distributions/constraint_registry.py in __call__(self, constraint)
    141         except KeyError:
--> 142             raise NotImplementedError(
    143                 f'Cannot transform {type(constraint).__name__} constraints') from None

NotImplementedError: Cannot transform _Boolean constraints

The above exception was the direct cause of the following exception:

NotImplementedError                       Traceback (most recent call last)
<ipython-input-22-27fa8b5ff90d> in <module>
----> 1 MCMC(NUTS(dina.model), num_samples=1,
      2             warmup_steps=1,
      3             num_chains=1).run()

Hi @rkmalaiya, Pyro NUTS does support models with discrete latent variables by enumeration but has some restrictions. You will need to use sequential plate, instead of .to_event(1), for k dimension at alpha site.

Alternatively, you can use NumPyro’s HMCGibbs with discrete_gibbs_fn, which has the same functionality as PyMC3. You can try both enumeration (which might be very slow if k is large) and hmc-within-gibbs to decide which one is better suited for your problem. :slight_smile:

Thanks for the response @fehiepsi .
Given my alpha parameter is 2D, I tried to modify the code given in this pyro example to make it work for 2D parameter (code below). But my implementation failed, I guess because the dimensions change due to enumeration.
However, I was able to proceed with my problem statement by making my alpha’s continuous and later converting them to discrete. I think NUTS has to do enumeration for discrete latent variables given the nature of algorithm, but I wasn’t very sure that I wanted enumeration. So I thought this way (code below) can be a workaround to avoid enumeration altogether. I am not sure whether this is a valid model, but atleast it’s working for MCMC-NUTS :smiley:. Thanks for all the help!!

Treating Alpha as continuous (code working):

    student_plate = pyro.plate("student", size = i, dim = -2)
    item_plate = pyro.plate("item", size = j, dim = -1)
    skill_plate = pyro.plate("skill", size = k, dim = -1)
    
    with student_plate, skill_plate:
        alpha_prior = pyro.sample("alpha_prior", dist.Beta(1,1))
    
    alpha = dist.Bernoulli(probs = alpha_prior).sample()
    
    with item_plate:
        guess = pyro.sample("guess", dist.Beta(1,1)) 
        slip = pyro.sample("slip", dist.Beta(1,1)) 

Sequential plate code modified for 2D parameter: (giving error)

def valid_model(self):
    #x = [] # <- Original code
    x = np.zeros([5,5]) # <- modified for 2D arrays
    
    for i in pyro.plate("plate", 5):  # <--- valid sequential plate
        for j in pyro.plate("n_plate_{}".format(i), 5): # <- modified for 2D arrays
            t = pyro.sample("x_{}{}".format(i,j), dist.Bernoulli(0.5))
            print("t ", t.shape)
            x[i,j] = t # <- modified
            
    pyro.sample("obs", dist.Normal(sum(torch.tensor(x)), 1.)) # <- modified

Error

t torch.Size([])
t torch.Size([])

t torch.Size([2])
in valid_model(self)
66 t = pyro.sample(“x_{}{}”.format(i,j), dist.Bernoulli(0.5))
67 print("t ", t.shape)
—> 68 x[i,j] = t # <- modified
69
70 pyro.sample(“obs”, dist.Normal(sum(torch.tensor(x)), 1.)) # <- modified

ValueError: only one element tensors can be converted to Python scalars
Trace Shapes:
Param Sites:
Sample Sites:
plate dist |
value 5 |
n_plate_0 dist |
value 5 |
x_00 dist |
value 2 |

this is not allowed. by doing this (i.e. by evading a pyro.sample statement) you’re ensuring that pyro doesn’t know about this randomness. so while the code may run this is not a valid inference procedure

x[i,j] = t # <- modified

Here t is not a scalar (under enumeration), so you can’t replace that way. I think you can collect all t, broadcast all, then concatenate, like this model. I would recommend to use HMC within Gibbs, which should work with your original model, without having to understand how to make enumeration work. If you still want enumeration, I would expect the model will be something like (haven’t verified it)

...
alpha_prior = torch.ones(()) / k
alphas = []
with student_plate:
    for s in range(k):
        alphas[s] = pyro.sample(f"alpha_{s}", dist.Bernoulli(alpha_prior))
    alpha = torch.cat(torch.broadcast_tensors(*alphas), axis=-1)
    ...
    with item_plate:
        generated_X = pyro.sample("generated_X", dist.Bernoulli(prob_of_X), obs=self.X)

I agree @martinjankowiak.

Thanks @fehiepsi. I will try first to use the code you recommended above and then will also try to HMC within gibbs.

Also, the code

alpha = pyro.sample(“alpha”, dist.Bernoulli(probs = alpha_prior).to_event(2))

works with TraceEnum_ELBO(). Just mentioning it here for future readers.

Thanks a lot for helping me out.

So the code that’s working for me right now using TraceEnum_ELBO() is below. I will try recommendations on how to do MCMC related algos and post back what worked for me :slight_smile:

   student_plate = pyro.plate("student", size = i, dim = -2)
    item_plate = pyro.plate("item", size = j, dim = -1)
    skill_plate = pyro.plate("skill", size = k, dim = -1)
    
    with student_plate, skill_plate:
        alpha_prior = pyro.sample("alpha_prior", dist.Beta(1,1))
    
    alpha = pyro.sample("alpha", dist.Bernoulli(probs = alpha_prior).to_event(2))
    
    with item_plate:
        guess = pyro.sample("guess", dist.Beta(1,1)) 
        slip = pyro.sample("slip", dist.Beta(1,1)) 

    t1 = torch.matmul(alpha, torch.transpose(self.Q, 0, 1))
    skills_and = torch.sum(Q, dim = 1)
    
    eta = torch.eq(t1, skills_and).long()
             
    a = torch.pow(guess, (1-eta)) 
    b = torch.pow((1-slip), eta) 

    prob_of_X = torch.mul(a, b) 
    
    with student_plate, item_plate:      
        generated_X = pyro.sample("generated_X", dist.Bernoulli(prob_of_X))

Because the code does not use enumeration, I guess Trace_ELBO will also run with your model. You might want to use TraceGraph_ELBO to reduce variance in ELBO estimation because Bernoulli is non-reparameterizable.

Ohh, I didn’t realize writing alpha this way (below) will avoid enumeration.

So I tried couple of things based on @fehiepsi suggestions and they worked great :smiley:

  • My above code (that avoids enumeration) works with TraceGraph_ELBO. Also works with Trace_ELBO.

  • Changed above code to use enumeration (I hope!!) and used TraceEnum_ELBO. The code works

#without using plates
alpha_prior = torch.ones((i,k))/k
alpha = pyro.sample(“alpha”, dist.Bernoulli(probs = alpha_prior).to_event(2))

  • Used code suggestion given by @fehiepsi to execute code with MCMC-NUTS and it works :smiley:
    alpha_prior = torch.ones((1)) / k
    alphas = []
    
    with student_plate:
        for s in range(k):
            alphas.append(pyro.sample(f"alpha_{s}", dist.Bernoulli(alpha_prior)))
            
            t2 = torch.broadcast_tensors(*alphas)
            
            alpha = torch.cat(t2, axis=-1)  

I was wondering, is there a way to restrict guess and slip (code below) in a way that

0 <= guess + slip <= 1

This is one of the constraints author of DINA model suggested.

    with item_plate:
        guess = pyro.sample("guess", dist.Beta(1,1)) 
        slip = pyro.sample("slip", dist.Beta(1,1)) 
        ## put constraint somewhere here
        ## 0 <= guess + slip <= 1

@fehiepsi… Thanks for all the help. I will execute all these versions of code and see the performance. Should be interesting!!

Should it be something like

slip_scaled = pyro.sample("slip_scaled", dist.Beta(1,1))
slip = pyro.deterministic("slip", slip_scaled * (1 - guess))

Pyro does not have something like TruncatedBeta.