Hello,
I want to apply different learning rates to the different groups of parameters in my Pyro neural network model. I was trying to follow this example from Pyro documentation:
adam = torch.optim.Adam(adam_parameters, {"lr": 0.001, "betas": (0.90, 0.999)})
sgd = torch.optim.SGD(sgd_parameters, {"lr": 0.0001})
loss_fn = pyro.infer.Trace_ELBO().differentiable_loss
# compute loss
loss = loss_fn(model, guide)
loss.backward()
# take a step and zero the parameter gradients
adam.step()
sgd.step()
adam.zero_grad()
sgd.zero_grad()
However, when I do:
myPyroModel.model.multiple_choice_head.parameters()
>>> <generator object Module.parameters at 0x7ff2f8b82850>
torch.optim.Adam(myPyroModel.model.multiple_choice_head.parameters(),
{"lr":0.001})
I get the following error:
if not 0.0 <= lr:
TypeError: '<=' not supported between instances of 'float' and 'dict'
What am I doing wrong here, and how can I fix this error?
I tried torch.optim.Adam(model.model.multiple_choice_head.parameters(), lr=0.001), but it gives me another error:
Traceback (most recent call last):
File "<ipython-input-20-1c381729206f>", line 1, in <module>
optimizer_3 = torch.optim.Adam(model.model.multiple_choice_head.parameters(), lr=0.001)
File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/torch/optim/adam.py", line 44, in __init__
super(Adam, self).__init__(params, defaults)
File "/Users/hyunjindominiquecho/opt/anaconda3/lib/python3.7/site-packages/torch/optim/optimizer.py", line 46, in __init__
raise ValueError("optimizer got an empty parameter list")
ValueError: optimizer got an empty parameter list
I am thinking PyTorch optim is complaining that there is no parameter in the list, because I first converted my multiple_choice_head of the Pytorch model into a Pyro Bayesian network, and then tried to execute torch.optim.Adam.
I only converted the multiple_choice_head portion of my PyTorch model into a Pyro Bayesian network, and left the rest parts of the same PyTorch model in its original Frequentist form.
I want to apply a higher learning rate for the multiple_choice_head (the part that I converted to Pyro model), while applying a small learning rate for all remaining frequentist part.
How should I tweak the code shown in Pyro documentation to achieve this?