Weight decay during optimization

geoffwoollard · April 13, 2022, 10:56am

Should I use weight decay during SVI? Should I experiment with suggested docs from Pytorch? torch.optim — PyTorch 1.13 documentation

I’ve been using Adam, only setting the learning rate.

fritzo · April 13, 2022, 12:48pm

I have not used weight decay, has anyone else?

martinjankowiak · April 13, 2022, 2:11pm

it depends what you’re doing but if you’re doing “canonical” probabilistic modeling you should probably not use weight decay because doing so is in effect changing your prior. if you’re doing something wackier that is more along the lines of bayesian deep learning then all bets are off and do whatever works