Does Weight Decay in Optimizer affect all params?

Just a quick question that I couldn’t find an explicit answer to: If we set a weight decay on say the Adam optimizer for a pyro model/guide, will it drive down all parameters or just those in nn.Modules?

assuming you’ve only defined a single optimizer then whatever parameters it’s optimizing will be treated the same. once pyro is told “optimize this” (either via param or module) there’s no difference in how optimization proceeds

Thanks! That’s very helpful to confirm.

So in order to apply weight decay to some parameters, but not others, seems like the answer is to use a MixedMultiOptimizer?

yes there are different ways to go about this, see e.g. here

