Site names when using nn.Sequential

yoshy · June 14, 2021, 4:33pm

I’m continuing on the model I’ve described here, adding complexity bit by bit. I’ve now updated theta to be modeled as a two layer nn.Sequential.

Relevant code snippet (some lines removed to make it more clear):

def __init__(self, in_features, h1 = 2, out_features = 1):
        super().__init__()
        
        # mu
        ...
                    
        # shape
        ...
        
        # theta
        self.theta = nn.Sequential(
            nn.Linear(in_features, h1),
            nn.ReLU(),            
            nn.Linear(h1, out_features),
            nn.Sigmoid(),
        )
        pyro.nn.module.to_pyro_module_(self.theta)
        for m in self.theta.modules():
            for name, value in list(m.named_parameters(recurse=False)):
                setattr(m, name, PyroSample(prior=dist.Laplace(0, 2)
                                            .expand(value.shape)
                                            .to_event(value.dim())))
        
        # relu
        self.relu = nn.ReLU()

    def forward(self, x, y=None):
        
        x = x.reshape(-1, 2)
        ...
        theta = self.theta(x).squeeze(-1)
        
        # will need to add GPU device
        with pyro.plate("data", x.shape[0]):
            obs = pyro.sample("obs", GammaHurdle(concentration = shape, rate = shape / mu, theta = theta), obs=y)
        return  torch.cat((mu, shape, theta), 0)

How do the sites get named for theta? I’d like to look at the distributions of those parameters using Predictive. With mu for example, if I use self.linear = PyroModule[nn.Linear](...), I can use Predictive(model, guide, num_samples, return_sites = ("linear.weight")). But I can’t figure out how theta gets named and how to access that distribution.

In general though, is there a way to get all possible options to use in return_sites? I looked at poutine but could not get that to work.

fehiepsi · June 15, 2021, 12:37am

I am seeing you are using this method. You can do the same to get named parameters of theta (probably before overwriting those attributes by PyroSample).

If you want to record the value of

theta = self.theta(x).squeeze(-1)

then you can use pyro.deterministic.

yoshy · June 15, 2021, 1:59am

Sorry, can you clarify how to use m.named_parameters() to get those names?

Similarly, is there a way to specify the names to be used within that nn.Sequential? One issue I ran into was if I have two blocks of nn.Sequential (e.g. for mu and theta), I got an error that params had the same names. I got around this by writing a separate function for mu and then that function gets called in forward.

@scope(prefix = 'mu')
    def mu_func(self, in_features, h1, out_features):
        mu = nn.Sequential(
                nn.Linear(in_features, h1),
                nn.ReLU(),            
                nn.Linear(h1, out_features)
            )
        pyro.nn.module.to_pyro_module_(mu)
        for m in mu.modules():
            for name, value in list(m.named_parameters(recurse=False)):
                setattr(m, name, PyroSample(prior=dist.Normal(0., 3.)
                                            .expand(value.shape)
                                            .to_event(value.dim())))
        return mu

Is there a better way to do multiple nn.Sequential?

fehiepsi · June 15, 2021, 3:39am

yoshy:

class A(torch.nn.Module):
    def __init__(self, in_features, h1 = 2, out_features = 1):
        super().__init__()
        
        # mu
        ...
                    
        # shape
        ...
        
        # theta
        self.theta = nn.Sequential(
            nn.Linear(in_features, h1),
            nn.ReLU(),            
            nn.Linear(h1, out_features),
            nn.Sigmoid(),
        )

With a = A(), I think you can do a.named_parameters() to get names of parameters of theta. It is just the same as the way you use m.named_parameters(recurse=False) in your code.

If your nn.Module has two submodules mu and theta, then the name of parameters will be

"mu.linear.weight",... or "sigma.linear.weight",...

I would recommend playing a bit with some PyTorch modules like Sequential to see how naming works in PyTorch. See its docs.

Your code

    def __init__(self, in_features, h1 = 2, out_features = 1):
        # theta
        self.theta = nn.Sequential(
            nn.Linear(in_features, h1),
            nn.ReLU(),            
            nn.Linear(h1, out_features),
            nn.Sigmoid(),
        )
        pyro.nn.module.to_pyro_module_(self.theta)
        for m in self.theta.modules():
            for name, value in list(m.named_parameters(recurse=False)):
                setattr(m, name, PyroSample(prior=dist.Laplace(0, 2)
                                            .expand(value.shape)
                                            .to_event(value.dim())))

looks a bit strange to me. I guess you can do

        theta = nn.Sequential(
            nn.Linear(in_features, h1),
            nn.ReLU(),            
            nn.Linear(h1, out_features),
            nn.Sigmoid(),
        )
        pyro.nn.module.to_pyro_module_(theta)
        for m in theta.modules():
            for name, value in list(m.named_parameters(recurse=False)):
                setattr(m, name, PyroSample(prior=dist.Laplace(0, 2)
                                            .expand(value.shape)
                                            .to_event(value.dim())))
        self.theta = theta

yoshy · June 15, 2021, 7:52pm

I don’t get any output when I do

for name, param in model.named_parameters():
    print(name, param)

print(list(model.parameters()))

I do get expected results from print(model) though.

BayesianRegression_LogGamma_shape_zeroInf_thetaFunc3(
  (mu_func_call): PyroSequential(
    (0): PyroLinear(in_features=2, out_features=2, bias=True)
    (1): PyroReLU()
    (2): PyroLinear(in_features=2, out_features=1, bias=True)
  )
  (linear_shape): PyroLinear(in_features=2, out_features=1, bias=True)
  (theta): PyroSequential(
    (0): PyroLinear(in_features=2, out_features=2, bias=True)
    (1): PyroReLU()
    (2): PyroLinear(in_features=2, out_features=1, bias=True)
    (3): PyroSigmoid()
  )
  (relu): ReLU()
)

fehiepsi · June 15, 2021, 11:26pm

It is strange. The code in my last comment does not invoke any Pyro stuff. a is just a PyTorch nn Module, and you can use named_parameters method to get all parameter names.

After you get all parameter names, you can convert a PyTorch parameter to a Pyro random variable by:

convert that Pytorch nn.Module into a PyroModule
access that parameter and change it to a PyroSample instance.

yoshy · June 16, 2021, 3:47pm

Here’s a nice solution (I think). Using idea from here

from collections import OrderedDict
def __init__(self, in_features, h1 = 2, out_features = 1):
        super().__init__()
        
        # parameter names list
        self.parameter_names = []
        
        # mu
        self.mu_func_call = self.mu_func(in_features, h1 = h1, out_features = out_features)
        
        # shape
        shape = OrderedDict([
            ('shape_fc0', nn.Linear(in_features=in_features,out_features=h1)),
            ('shape_ReLU0', nn.ReLU()),
            ('shape_fc1L:final', nn.Linear(in_features=h1,out_features=out_features))
        ])
        self.shape = nn.Sequential(shape)
        
        for name, param in self.shape.named_parameters():
            self.parameter_names.append(name)
        
        pyro.nn.module.to_pyro_module_(self.shape)
        for m in self.shape.modules():
            for name, value in list(m.named_parameters(recurse=False)):
                setattr(m, name, PyroSample(prior=dist.Laplace(0., 3.)
                                            .expand(value.shape)
                                            .to_event(value.dim())))

Then, I can just reference model.parameter_names to get those names later. The names are conveniently returned as expected when using Predictive, e.g. Site: shape_fc0.weight.

fehiepsi · June 18, 2021, 8:45pm

Glad that it works for your case. I just post a note here for future reference

import torch
import torch.nn as nn

class A(nn.Module):
    def __init__(self):
        super().__init__()
        self.block1 = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )
        self.block2 = nn.Sequential(
          nn.Conv2d(1,20,5),
          nn.ReLU(),
          nn.Conv2d(20,64,5),
          nn.ReLU()
        )
        
a = A()
print({k: v.shape for k, v in a.named_parameters()})

will return names of parameters in A:

{'block1.0.weight': torch.Size([20, 1, 5, 5]),
 'block1.0.bias': torch.Size([20]),
 'block1.2.weight': torch.Size([64, 20, 5, 5]),
 'block1.2.bias': torch.Size([64]),
 'block2.0.weight': torch.Size([20, 1, 5, 5]),
 'block2.0.bias': torch.Size([20]),
 'block2.2.weight': torch.Size([64, 20, 5, 5]),
 'block2.2.bias': torch.Size([64])}

To turn a into a fully bayesian model, one way is

pyro.nn.module.to_pyro_module_(a)
for m in a.modules():
    for name, value in list(m.named_parameters(recurse=False)):
        setattr(m, name, pyro.nn.PyroSample(prior=pyro.distributions.Normal(0, 1)
                                            .expand(value.shape)
                                            .to_event(value.dim())))

Let’s execute it and print out site names

with pyro.poutine.trace() as tr:
    a(torch.ones(3, 1, 10, 10))

list(tr.trace.nodes.keys())

which returns

['block1.0.weight',
 'block1.0.bias',
 'block1.2.weight',
 'block1.2.bias',
 'block2.0.weight',
 'block2.0.bias',
 'block2.2.weight',
 'block2.2.bias']