Does Weight Decay in Optimizer affect all params?

mtvector · March 30, 2022, 11:23pm

Just a quick question that I couldn’t find an explicit answer to: If we set a weight decay on say the Adam optimizer for a pyro model/guide, will it drive down all parameters or just those in nn.Modules?
Thanks!
Matthew

martinjankowiak · April 3, 2022, 5:12pm

assuming you’ve only defined a single optimizer then whatever parameters it’s optimizing will be treated the same. once pyro is told “optimize this” (either via param or module) there’s no difference in how optimization proceeds

mtvector · April 3, 2022, 7:18pm

Thanks! That’s very helpful to confirm.

So in order to apply weight decay to some parameters, but not others, seems like the answer is to use a MixedMultiOptimizer?

martinjankowiak · April 3, 2022, 7:42pm

yes there are different ways to go about this, see e.g. here