I was trying to implement a lr_range_test and in the process, I believe I found a bug in the PyroLRScheduler.
I have created a simple VAE and trained on FashionMNIST both in Pyro and Pytorch. In all cases I have used RMSprop as optimizer and StepLR as a scheduler which at every epoch change the LR by a factor gamma.
I have done the following checks:
for LR=1E-4 and gamma=1.0 (i.e. LR is constant) both Pyro and Pytorch implementation work great.
for LR=1E-6 and gamma=1.5 (i.e. LR increases exponentially) the Pytorch implementation works for 22 epochs when the LR is too large (1E-2) and the loss becomes Nan. This is the expected behavior.
the same setup as before, i.e. for LR=1E-6 and gamma=1.5, in the Pyro implementation produces nan from the start (i.e. not even a single epoch runs successfully)
I have done other tests (i.e. changing the in initial LR and gamma) the conclusion is that, unless gamma=1.0, the implementation with the PyroScheduler produces nan even before the first epoch (which is the earliest time at which the scheduler should change the learning rate).
I do not believe that this is a mistake in my code since the code runs perfectly when gamma=1.0. The problem must be related to the scheduler.
Below are the links to the Pytorch vs Pyro comparison for the case gamma=1.0 and gamma=1.5.
Should I open a bug report?