ClippedAdam with LR Scheduler?

Possibly stupid question: As I understand it, the PyroLRSchedulers require an optimizer of class torch.optim.Optimizer, not a PyroOptim like ClippedAdam. The usual usage pattern thus seems to be to use the pyro schedulers with the original torch optimizers. Does that mean it is impossible to use, e.g., ReduceLROnPlateau with ClippedAdam? Or is there some conversion trick I’m not aware of / something I am missing? So far, I was unable to get the two to work together.

ClippedAdam was implemented before torch introduced schedulers and so is really only meant to be used as a stand-alone.

if you want to add clipping and such you should probably look into the pytorch optim api and do that on the torch side

Hmm… sadly, that seems to be quite involved. On the pytorch side, there seem to be two recommended ways of doing gradient clipping:

  1. explicitly call torch.nn.utils.clip_grad_norm_(.) at a specific point in the optimization loop, or
  2. register a hook to the nn.Module to do the clipping. (As described, e.g., here.)

Option 1) would basically require me to fully implement the whole training loop by myself without any utility functions (no svi.step(), no LR scheduler etc.).

For option 2), I would probably have to rewrite my whole model to by a PyroModule…? I currently use the standard def model(x, y): ... mechanism.

Both options are probably feasible but seem quite involved for such a relatively simple and standard modification?

do you know about this pattern?

I was just looking at it. :wink: I might give that a shot, thanks!

For the future, could it maybe be an option to allow passing a gradient clipping parameter to the SVI object itself, which would then clip the gradients at the right place in its step method?

this is a bit like asking a teapot to also tell the time because maybe you’d find that to be convenient. it doesn’t make sense to anticipate all possible custom optimization strategies: that should be the role of torch.optim etc