Excluding parameters from optimization when using SVI

I’m defining a probabilistic model with some frozen parameters (i.e., weights of a a pre-trained CNN) and have been roughly following the Bayesian Regression tutorial. I’m able to train my model when I’m not using gpus, but when I use gpus I get the following error: raise ValueError("can't optimize a non-leaf Variable")

I followed the tutorial for initializing the optimizer; below is a sketch of my model, in which I’m trying to learn a distribution over x_.

class Model(nn.Module):
  def __init__(self, arch):
    super(Model, self).__init__()
    self.x_ = torch.nn.Parameter(data=torch.rand(1,3,224,224), requires_grad=True)
    self.cnn = get_model(arch) # i.e., a pytorch pretrained network
  def forward(self):
    return self.cnn(self.x_)

I’ve tried the following:

  • defining a per_param_callable function that return {} for all parameters that aren’t x_.
  • checking/setting the active_params before svi.step is called (there are none before the first call)
  • passing in named parameter to the optimizer: Adam({'params': 'x_', 'lr': 1e-2}) (an error about multiple param arguments is thrown, as I believe SVI sets the params)

Any ideas would be much appreciated.

Hi, since you say you’re following the Bayesian regression tutorial, this is probably happening because you’re using pyro.random_module. If that’s the case, I suggest removing random_module from the model, setting self.x_ = pyro.sample("x_", some_prior), and writing a guide that only contains a pyro.sample("x_", ...) statement instead of another pyro.random_module call. That way Pyro won’t try to optimize the parameters of self.cnn.

The random_module interface is currently a bit inflexible.

Thanks for the suggestion; I just tried that and it works as expected in the non-cuda case but throws the same error (in full below) in the cuda case. Even without calling random_module, I’m returning the existing instantiated instance of my Model in my guide (as the loss is to be calculated in CNN feature space).

  File "bayesian_regression.py", line 150, in <module>
    train_bayesian_regressor(epochs=args.epochs, cuda=cuda)
  File "bayesian_regression.py", line 136, in train_bayesian_regressor
    loss = svi.step(y_data, len(data_image_paths), network, image_size=network_size)
  File "/users/ruthfong/anaconda2/lib/python2.7/site-packages/pyro/infer/svi.py", line 105, in step
    self.optim(params)
  File "/users/ruthfong/anaconda2/lib/python2.7/site-packages/pyro/optim/optim.py", line 48, in __call__
    self.optim_objs[p] = self.pt_optim_constructor([p], **def_optim_dict)
  File "/users/ruthfong/anaconda2/lib/python2.7/site-packages/torch/optim/adam.py", line 29, in __init__
    super(Adam, self).__init__(params, defaults)
  File "/users/ruthfong/anaconda2/lib/python2.7/site-packages/torch/optim/optimizer.py", line 39, in __init__
    self.add_param_group(param_group)
  File "/users/ruthfong/anaconda2/lib/python2.7/site-packages/torch/optim/optimizer.py", line 155, in add_param_group
    raise ValueError("can't optimize a non-leaf Variable")
ValueError: can't optimize a non-leaf Variable

I misfollowed the tutorial and called type_as(data) on called on all Variable objects created in both the model and guide functions, when I should have been calling type_as(data) on the internal tensors in guide for all Variables that requires a gradient, i.e., Variable(torch.zeros(1, p).type_as(data.data) (arguably, this should also be done in model for consistency/best practices). This is because x.cuda() is different than the original x (see the following pytorch threads).

yes, that is a subtle pytorch nuance when using cuda. what you can also do if you are only using the GPU is to use torch.set_default_tensor_type and then all your allocated tensors will be CUDA tensors by default.