Unable to do next(model.parameters()) with Pyro models

h56cho · July 30, 2020, 2:49am

Hello,

I am trying to convert myPyTorchModel into a Bayesian Pyro model (see code below).
However, when I execute the line next(myPyTorchModel.parameters()), StopIteration error pops up. I am wondering if there is any way to prevent Pyro models from generating this StopIteration error.

for m in myPyTorchModel.modules():

    for name, value in list(m.named_parameters(recurse=False)):

        options =  dict(dtype=torch.double, device="cpu")
        prior_loc = torch.zeros(1, 1, **options)
        prior_scale = torch.ones(1, 1, **options)
        zs = module.PyroSample(dist.Normal(prior_loc, prior_scale).to_event(1))
        zs.double()
        setattr(m, name, zs)
        
# generates an error
next(myPyTorchModel.parameters())

OUT:
Traceback (most recent call last):

  File "<ipython-input-37-f9ae77a21ef4>", line 1, in <module>
    next(myPyTorchModel.parameters())

StopIteration

How can I fix this error?

Thanks,

fritzo · July 30, 2020, 10:05pm

Hi @h56cho,

The reason next(myPyTorchModel.parameters()) fails is that the model has no parameters: you have replaced all parameters with PyroSample objects. These objects are random priors and have no parameters themselves. If you wanted to learn parameters of a posterior you could define an automatic guide

guide = AutoNormal(myPyTorchModel)

then initialize the guide by calling it once on data

guide(data)  # or whatever your model inputs

then examine the guide’s parameters, e.g.

next(guide.parameters())

h56cho · July 31, 2020, 3:45am

Hello,

Thank you very much for your reply.
so is the guide just like a snap-shot copy of a Pyro model?

That is, each time after the parameters are sampled from the distribution, does the guide function like a frequentist snap-shot of the Bayesian Pyro model in which its parameter values are fixed at the sampled value?

Thank you,

fritzo · July 31, 2020, 5:18am

In Pyro the guide represents an approximate posterior distribution. Each call of guide() samples values from the approximate posterior. When you initial create guide = AutoNormal(model) the guide parameters are nonsense; you’ll need to first train the guide (fit it to data) using variational inference. See the variational inference tutorial for more detailed explanation.

h56cho · August 1, 2020, 4:51pm

Hello,

Thank you for all your help. What I am trying to do is I want to use a Bayesian Pyro model for Natural Language Processing (NLP). I took the appropriate steps to convert my frequentist NLP model into a Bayesian Pyro model, but I am having a problem with training this new Bayesian model.

If I understand things correctly, the svi.step() method calculates the loss resulting from a given input and a model, but I am stuck at the step where I want to calculate the loss with svi.step(), because when I call svi.step() with an appropriate input, errors are generated due to the fact that myNLPModel has no set parameter (because myNLPModel is a Pyro model). The lines of code that applies to the traditional non-Bayesian PyTorch models, such as next(myNLPModel.parameters()).dtype or assert padding_idx < weight.size(0) , etc., do not work well with Pyro models (and these are the points where the errors occur).

For your information, myNLPModel is a HuggingFace PyTorch Transformer.

Would you be able to suggest any way to get around this issue? If it happens to be that I have some error in my for-loop for training the Bayesian NLP model, please point it out for me. Thank you,

My code for training is below:

# NOTE: myNLPModel is a Bayesian Pyro model
# define guide      
guide = guides.AutoDelta(myNLPModel)
        
# define optimizer and scheduler
optimizer = Adam({"lr": 0.000000055}) 
scheduler = pyro.optim.StepLR({'optimizer': optimizer, 
                               'optim_args': {'lr': 0.000000055}})

# define SVI
svi = SVI(myNLPModel, guide, optimizer, loss=Trace_ELBO())
total_svi_loss = 0 

# training loop
for m in range(num_iter):

        # calculate the loss and take a gradient step for svi
        # THIS IS WHERE ERRORS OCCUR
        svi_loss = svi.step(input_ids = input_ids, 
                          attention_mask = attention_mask, 
                          labels = label)

        # update the with the calculated loss 
        total_svi_loss = total_svi_loss + svi_loss
       
        if m % log_interval == 0 and m > 0:
            cur_svi_loss = total_svi_loss / log_interval
            print('| epoch {:3d} | {:5d}/{:5d} batches | lr {:02.9f} | 
                   loss {:5.4f} | ppl {:8.4f}'.format(
                    epoch, m, int(num_lines_train/4), scheduler.get_lr()[0], 
                    cur_svi_loss, math.exp(cur_svi_loss)))
                   
            total_svi_loss = 0

eb8680_2 · August 3, 2020, 9:27pm

Hi @h56cho, could you provide a simple runnable example script that reproduces the errors you’re seeing, since they seem to be dependent on the particular neural network you’re using? That would help with suggesting workarounds and with figuring out if there are changes we should make to PyroModule.

h56cho · August 3, 2020, 9:46pm

Hello,
I have provided the reproducible example below. The PyTorch model I am working with is HuggingFace Transformer (SEE: GitHub - huggingface/transformers: 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX., I am using RobertaForMultipleChoice to be more specific). Thank you for looking into my case; I hope this is helpful.

from transformers import RobertaTokenizer, RobertaForMultipleChoice, AdamW, get_constant_schedule
from transformers import PreTrainedTokenizer
import torch
import pyro
import pyro.infer
import pyro.optim
import pyro.distributions as dist
import pyro.nn.module as module
import pyro.infer.autoguide.guides as guides
from torch import nn
from pyro.optim import Adam
from pyro.infer import SVI
from pyro.infer import Trace_ELBO

# define input_ids and attention_masks
input_ids = torch.tensor([[[    0,   102,  1816,    16,  2343,   816, 14545,    15,    10,   165,
             11,    41, 11894,  6545,   479,     5,  1011,     2,     2,   354,
          22362,  5134,   124,     8,  7264,    81,     5,  1161,  1533,   498,
            479,     2,     1,     1,     1,     1,     1,     1,     1,     1,
              1,     1],
         [    0,   102,  1816,    16,  2343,   816, 14545,    15,    10,   165,
             11,    41, 11894,  6545,   479,     5,  1011,     2,     2,   354,
           3148,    11,    10,  2698,     8,    10,   313,  1420,    69,    39,
           1028,   479,     2,     1,     1,     1,     1,     1,     1,     1,
              1,     1],
         [    0,   102,  1816,    16,  2343,   816, 14545,    15,    10,   165,
             11,    41, 11894,  6545,   479,     5,  1011,     2,     2,   354,
           5629,    66,  1706,     5,  1161,     8,     5,  1816,   386,     7,
          23322,  5225,     5,  1011,   479,     2,     1,     1,     1,     1,
              1,     1],
         [    0,   102,  1816,    16,  2343,   816, 14545,    15,    10,   165,
             11,    41, 11894,  6545,   479,     5,  1011,     2,     2,   354,
           5629,    31,    69,   865,  3987,     8,    79,  1388,  3022,    24,
            124,     8,  7264,    25,    79, 36989,    24,   160,     9,  4257,
            479,     2]]])

attention_mask = torch.tensor([[[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
          1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0],
         [1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
          1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]]])

mc_labels = torch.tensor([0])

# get the pre-trained HuggingFace RobertaForMultipleChoice model
model_RobertaForMultipleChoice = RobertaForMultipleChoice.from_pretrained('roberta-base', output_hidden_states = True)

# convert the HuggingFace Transformer model into a Bayesian Pyro model
module.to_pyro_module_(model_RobertaForMultipleChoice)
        
# Now we can attempt to be fully Bayesian:
for m in model_RobertaForMultipleChoice.modules():
     for name, value in list(m.named_parameters(recurse=False)):
           setattr(m, name, module.PyroSample(prior=dist.Normal(0, 1)
                                              .expand(value.shape)
                                              .to_event(value.dim())))
                
# define parameters for training      
guide_diag_normal = guides.AutoDiagonalNormal(model_RobertaForMultipleChoice)
optimizer = Adam({"lr": 0.000000055}) 
scheduler = pyro.optim.StepLR({'optimizer': optimizer, 'optim_args': {'lr': 0.000000055}})

# define SVI
svi_diag_normal = SVI(model_RobertaForMultipleChoice, guide_diag_normal, optimizer, loss=Trace_ELBO())

# calculate loss from SVI
# ERRORS ARE GENERATED HERE
svi_loss = svi_diag_normal.step(input_ids = input_ids, attention_mask = attention_mask, labels = mc_labels)

Below is the error that gets displayed:

ERRORS:
Traceback (most recent call last)

svi_loss = svi_diag_normal.step(input_ids = input_ids, attention_mask = attention_mask, labels = mc_labels)
Traceback (most recent call last):

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/transformers/modeling_utils.py", line 150, in dtype
    return next(self.parameters()).dtype

StopIteration


During handling of the above exception, another exception occurred:

Traceback (most recent call last):

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/elbo.py", line 170, in _get_traces
    yield self._get_trace(model, guide, args, kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/trace_elbo.py", line 53, in _get_trace
    "flat", self.max_plate_nesting, model, guide, args, kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/enum.py", line 44, in get_importance_trace
    guide_trace = poutine.trace(guide, graph_type=graph_type).get_trace(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py", line 185, in get_trace
    self(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py", line 165, in __call__
    ret = self.fn(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/nn/module.py", line 290, in __call__
    return super().__call__(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/autoguide/guides.py", line 679, in forward
    self._setup_prototype(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/autoguide/guides.py", line 819, in _setup_prototype
    super()._setup_prototype(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/autoguide/guides.py", line 577, in _setup_prototype
    super()._setup_prototype(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/autoguide/guides.py", line 156, in _setup_prototype
    self.prototype_trace = poutine.block(poutine.trace(model).get_trace)(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py", line 11, in _context_wrap
    return fn(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py", line 185, in get_trace
    self(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py", line 165, in __call__
    ret = self.fn(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py", line 11, in _context_wrap
    return fn(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py", line 11, in _context_wrap
    return fn(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/nn/module.py", line 290, in __call__
    return super().__call__(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/transformers/modeling_roberta.py", line 441, in forward
    output_hidden_states=output_hidden_states,

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/nn/module.py", line 290, in __call__
    return super().__call__(*args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/transformers/modeling_bert.py", line 732, in forward
    extended_attention_mask: torch.Tensor = self.get_extended_attention_mask(attention_mask, input_shape, device)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/transformers/modeling_utils.py", line 228, in get_extended_attention_mask
    extended_attention_mask = extended_attention_mask.to(dtype=self.dtype)  # fp16 compatibility

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/transformers/modeling_utils.py", line 159, in dtype
    first_tuple = next(gen)

StopIteration


The above exception was the direct cause of the following exception:

Traceback (most recent call last):

  File "<ipython-input-13-9f26219c3f71>", line 1, in <module>
    svi_loss = svi_diag_normal.step(input_ids = input_ids, attention_mask = attention_mask, labels = mc_labels)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/svi.py", line 128, in step
    loss = self.loss_and_grads(self.model, self.guide, *args, **kwargs)

  File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/pyro/infer/trace_elbo.py", line 126, in loss_and_grads
    for model_trace, guide_trace in self._get_traces(model, guide, args, kwargs):

RuntimeError: generator raised StopIteration

h56cho · August 5, 2020, 12:47am

Hello,

I’d hate to make you feel rushed on anything (and I don’t intend to), but if Pyro team is looking into my issue, could you let me know roughly how long it will take Pyro team to fix this issue with HuggingFace Transformers? I am hoping to see the Transformers models working with Pyro as soon as possible (if it will take a long time, that’s fine too).

I also had opened a thread about this issue on the HuggingFace Transformers github page,
(SEE: How to integrate the Pyro module with HuggingFace Transformers? · Issue #6191 · huggingface/transformers · GitHub) and it looks like someone from the HuggingFace team is also looking into this issue as well. I thought that I should let the Pyro team know about this.

Thank you once again for your help, it is much appreciated.

eb8680_2 · August 5, 2020, 1:27am

I don’t see a universal fix we could make in Pyro, since the error seems to be specific to transformers, but it seems like the simplest thing that would solve your immediate problem is to add a dummy nn.Parameter to the model or skip converting one of the existing parameters (a bias in one of the hidden layers, say) to a PyroSample, which should make the dtype and device methods of the Transformer work as expected.

h56cho · August 5, 2020, 1:55am

Hello,
Thank you for your reply.
I’d hate to keep bug you on this question, but if possible, could you tell me how I can fix the loop below to make Pyro skip converting one of the Transformer parameters?

# Now we can attempt to be fully Bayesian:
for m in model_RobertaForMultipleChoice.modules():
     for name, value in list(m.named_parameters(recurse=False)):
           setattr(m, name, module.PyroSample(prior=dist.Normal(0, 1)
                                              .expand(value.shape)
                                              .to_event(value.dim())))

:S thank you,

eb8680_2 · August 5, 2020, 2:32am

Does the following work? It makes the StopIteration error go away, but I can’t run a full forward pass on my underpowered laptop…

model_RobertaForMultipleChoice.roberta._dummy_param = nn.Parameter(torch.tensor(0.).to(dtype=model_RobertaForMultipleChoice.dtype, device=model_RobertaForMultipleChoice.device))

# Now we can attempt to be fully Bayesian:
for m in model_RobertaForMultipleChoice.modules():

    for name, value in list(m.named_parameters(recurse=False)):
        if name != "_dummy_param":
            setattr(m, name, module.PyroSample(prior=dist.Normal(0, 1)
                                              .expand(value.shape)
                                              .to_event(value.dim())))

Arcco96 · September 5, 2020, 7:38pm

Hello,

I really want to experiment with a bayesian version of a hugginface model. Any advice on how to get this set up. Much more familiar with nlp than bayesian deep models. How far did you get with this implementation?

What would you suggest to get this running?

Thanks!