Just a quick question that I couldn’t find an explicit answer to: If we set a weight decay on say the Adam optimizer for a pyro model/guide, will it drive down all parameters or just those in nn.Modules?

Thanks!

Matthew

assuming you’ve only defined a single optimizer then whatever parameters it’s optimizing will be treated the same. once pyro is told “optimize this” (either via `param`

or `module`

) there’s no difference in how optimization proceeds

1 Like

Thanks! That’s very helpful to confirm.

So in order to apply weight decay to some parameters, but not others, seems like the answer is to use a MixedMultiOptimizer?