Does Weight Decay in Optimizer affect all params?

Just a quick question that I couldn’t find an explicit answer to: If we set a weight decay on say the Adam optimizer for a pyro model/guide, will it drive down all parameters or just those in nn.Modules?

assuming you’ve only defined a single optimizer then whatever parameters it’s optimizing will be treated the same. once pyro is told “optimize this” (either via param or module) there’s no difference in how optimization proceeds

1 Like

Thanks! That’s very helpful to confirm.

So in order to apply weight decay to some parameters, but not others, seems like the answer is to use a MixedMultiOptimizer?

yes there are different ways to go about this, see e.g. here

1 Like