Just a quick question that I couldn’t find an explicit answer to: If we set a weight decay on say the Adam optimizer for a pyro model/guide, will it drive down all parameters or just those in nn.Modules?
Thanks!
Matthew
assuming you’ve only defined a single optimizer then whatever parameters it’s optimizing will be treated the same. once pyro is told “optimize this” (either via param
or module
) there’s no difference in how optimization proceeds
1 Like
Thanks! That’s very helpful to confirm.
So in order to apply weight decay to some parameters, but not others, seems like the answer is to use a MixedMultiOptimizer?
yes there are different ways to go about this, see e.g. here
1 Like