PyroModule, PyroSample, PyroParam and GRU

artistworking · March 26, 2020, 2:30pm

Hi!

I am writing again about the same issue posted here: PyroModule for LSTM.
I have been following this example : Neural Networks — Pyro documentation

But either there is a bug or I am not doing something in the right order, hehe

Pseudo code of my model:

def Convert_to_PyroSample(model):
    for m in model.modules():
        for name, value in list(m.named_parameters(recurse=False)):
            setattr(m, name, PyroSample(prior=dist.Normal(0, 1).expand(value.shape).to_event(value.dim())))

class GRU(nn.Module): # I am not sure whether this should already be PyroModule (with either option the error persists)
        def __init__(self):
            super(...)
            self.GRU = nn.GRU(flags...)
            self.h_0   = nn.Parameter(torch.rand(GRU_hidden_size), requires_grad = True) #Shuld it be converted to PyroParam here?
       def forward(self,input):
             to_pyro_module(self.GRU)
             Convert_to_PyroSample(self.GRU)
             h_0_contig = PyroParam(self.h_0.repeat(...).contiguous())   # Not convinced about this
             output,_ = self.GRU(input, h_0_contig)
             return output

a) If I try to convert h_0_contig to PyroParam I get this error:

File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py”, line 175, in check_hidden_size
if hx.size() != expected_hidden_size:
AttributeError: ‘PyroParam’ object has no attribute ‘size’

b) If I don’t bother with PyroParam I get the same error as in https://forum.pyro.ai/t/pyromodule-for-lstm/1596.:

File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py”, line 716, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
TypeError: expected Tensor as element 0 in argument 2, but got PyroSample

I am just confused on the semantics, thanks for your help in advance! I am using pyro 1.3.0

Best wishes

fritzo · March 27, 2020, 12:25am

Hi @artistworking,
I think there’s just some confusion between our old pyro.param usage and the new PyroParam usage. Whereas pyro.param belongs in the .forward() method, PyroParam now belongs in the .__init__() method, and you access those params in the .forward() method. Here’s an attempt:

def Convert_to_PyroSample(model):
    for m in model.modules():
        for name, value in list(m.named_parameters(recurse=False)):
            setattr(m, name, PyroSample(prior=dist.Normal(0, 1).expand(value.shape).to_event(value.dim())))

class GRU(PyroModule):
    def __init__(self):
        super(...)
        self.GRU = nn.GRU(flags...)
        self.h_0 = PyroParam(torch.rand(GRU_hidden_size))
        Convert_to_PyroSample(self.GRU)
   def forward(self, input):
        # In the following line, the self.h_0 lookup now
        # triggers an internal pyro.param() call:
        h_0_contig = self.h_0.repeat(...).contiguous()
        output,_ = self.GRU(input, h_0_contig)
        return output

Let me know if that still doesn’t work. Also you might try grepping around the Pyro codebase for other internal uses of PyroModule, e.g. in pyro/test_module.py at master · pyro-ppl/pyro · GitHub

artistworking · March 27, 2020, 10:43am

Thanks so much for your reply. I have switched the code as suggested and I also had a look at the test_module.py . I can see I mixed some stuff up

I have made more changes actually, because otherwise I could not convert the GRU parameters to PyroSample. Convert_to_PyroSample was not working, see error below:

File “/home/…/Example.py”, line 61, in Convert_to_PyroSample
setattr(m, name, PyroSample(prior=dist.Normal(0, 1).expand(value.shape).to_event(value.dim())))
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py”, line 97, in setattr
super(RNNBase, self).setattr(attr, value)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 595, in setattr
.format(torch.typename(value), name))
TypeError: cannot assign ‘pyro.nn.module.PyroSample’ as parameter ‘weight_ih_l0’ (torch.nn.Parameter or None expected)

def Convert_to_PyroSample(model):
    for m in model.modules():
        for name, value in list(m.named_parameters(recurse=False)):
            setattr(m, name, PyroSample(prior=dist.Normal(0, 1).expand(value.shape).to_event(value.dim())))

class GRU(PyroModule):
    def __init__(self):
        super(...)
        self.GRU = PyroModule[nn.GRU](flags...) <--- Added that
        self.h_0 = PyroParam(torch.rand(GRU_hidden_size))
        Convert_to_PyroSample(self.GRU)
   def forward(self, input):
        # In the following line, the self.h_0 lookup now
        # triggers an internal pyro.param() call:
        h_0_contig = self.h_0.repeat(...).contiguous()
        assert isinstance(self.GRU, PyroModule) --> It is a PyroModule
        output,_ = self.GRU(input, h_0_contig)
        return output

However, the GRU does not seem to have been affected by the PyroModule. I get the same error if I manually assign each of the weights and bias of the GRU to PyroSample:

File “/home/…/Example.py”, line 85, in forward
output, _ = self.GRU(input,h_0_contig)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/nn/module.py”, line 288, in call
return super().call(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py”, line 716, in forward
self.dropout, self.training, self.bidirectional, self.batch_first)
TypeError: expected Tensor as element 0 in argument 2, but got PyroSample

Thanks again!

fritzo · March 28, 2020, 5:49pm

Hi @artistworking, I think this may be a low-level incompatibility between PyroModule and nn.RNN. Can you see if the following workaround helps?

class GRU(PyroModule):
    ...
    def forward(self, input):
        h_0_contig = self.h_0.repeat(...).contiguous()
        self.GRU._apply(lambda t: t)  # <--- recomputes GRU._flat_weights
        output,_ = self.GRU(input, h_0_contig)
        return output

For context, it appears that nn.RNN caches flat views of its parameters, and that the cache becomes invalid at each Pyro sample call, i.e. when new parameters are sampled in GRU.forward. I’m not sure how to fix, maybe we can provide a pyro.nn.PyroRNN wrapper or something. ~~Feel free to file a bug at Sign in to GitHub · GitHub or I can do so if you prefer.~~ EDIT I’ve filed an issue here: PyroModule incompatible with torch.nn.RNN · Issue #2390 · pyro-ppl/pyro · GitHub

artistworking · March 30, 2020, 9:06am

Hi again!

Thanks for your reply and filing the issue :D. I tried your approach and got a cryptic error, hehehe:

File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/nn/module.py”, line 288, in call
return super().call(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 532, in call
result = self.forward(*input, **kwargs)
File “/home/…/Dropbox/PhD/DRAUPNIR/Draupnir_pyro.py”, line 159, in forward
self.GRU._apply(lambda t: t)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py”, line 140, in _apply
self.flatten_parameters()
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py”, line 105, in flatten_parameters
any_param = next(self.parameters()).data
StopIteration

Anyways, I guess it is related to what you mean by “catching flat views of its parameters”. I tried self.GRU.flatten_parameters() just for fun, same error heheh.

Thank you very much. Hoping we can get to the bottom of this

fritzo · March 30, 2020, 5:11pm

@artistworking PyroModule[nn.GRU] should now work on Pyro’s dev branch, and should work on our next release. Thanks for pointing out this issue, and let me know if you have further issues.

artistworking · March 31, 2020, 9:46am

Hi again!

Thanks so much for the effort. The first error has been fixed :). However, I may be wrong but I think the input shape gets flattened somewhere and then the GRU complains. Meaning (pseudo-code):

input = torch.Tensor([29,531,30]) #[Batch size,seq_len,features]
GRU = PyroModule[nn.GRU](input_size=features, hidden_size=10, batch_first=True,
bidirectional=True, num_layers=self.num_layers, dropout=0.0)
h_0_contig = self.h0.repeat(self.num_layers *2, input.shape[0], 1).contiguous()
output, _ = self.GRU(input,h_0_contig)

And the error:

  File "/home/.../anaconda3/lib/python3.7/site-packages/pyro/nn/module.py", line 290, in __call__
    return super().__call__(*args, **kwargs)
  File "/home/.../anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/.../anaconda3/lib/python3.7/site-packages/torch/nn/modules/rnn.py", line 716, in forward
    self.dropout, self.training, self.bidirectional, self.batch_first)
RuntimeError: shape '[900, 1]' is invalid for input of size 26100

I believe I am using the GRU correctly (double checked everything)… The shapes of the parameters of the GRU are as expected, but somehow something gets flattened in the input, I think (29 * 30 * 30 = 26100). It gets doubled up.

Thanks again!

fritzo · March 31, 2020, 9:01pm

Hmm, could you create a minimum reproducible example?

One thing that seems suspicious is that the GRU docs claim input shape should be (seq_len, batch, input_size) whereas your input has shape (batch, seq_len, input_size), and that shape later determines the h_0_contig shape.

Also can you confirm that the shapes are correct when using nn.GRU but fail when using PyroModule[nn.GRU]?

artistworking · April 1, 2020, 8:57am

Hi!

Yes, I mark the flag of the GRU of “batch_first” = True. That is why my batch goes first.

And yes, in this same example without PyroModule and in other models I have implemented I use it the same way without a problem

Thanks!

ffp · April 8, 2022, 8:37am

Hi, I implemented a simple BNN very similar (LSTM + linear layer) to this in this post to perform predictions on sequences. My likelihood is a Categorical distribution and as a guide I used (at least for the moment) a AutoDiagonalNormal. What I noticed is that the loss rapidly decreases (from 273 to almost 10), while the accuracy is very (very) poor like a random classifier (50% across several training epochs and a similar result on the test set as well).
I don’t know if I have made some error in the implementation (I’m unable to spot it so far), but, since you have already used this model on sequential data (text, other?), can I ask you if you have ever noticed a similar behavior? Otherwise, if your model is working fine, can I ask you if you adopted some specific strategy to allow it to work as expected?

Thank you so much.

artistworking · April 8, 2022, 9:36am

Hi @ffp! Without the code, it’s hard to tell. To be honest, I am not even sure what happened to my implementation of this, I changed directions. But there is also a possibility that your dataset it’s very noisy and you are overfitting to the noise. I always recommend checking that your data is 100% fit, make some dimensionality reduction projections (i.e UMAP or t_SNE) to check that everything is alright. Things like that. If your data is not alright, it’s not learnable.

ffp · April 8, 2022, 11:08am

Ok. Thanks for your reply.