Hi !
First, thanks for your attention and second I am writing to please ask you to seek for a solution that stabilizes the DMM example, which as you know is not completely stable and the loss goes to βnanβ values.
We are trying to use it in our own version of the DMM and is unstable when we increase the amount of data. Might be due to errors in our own model (which we are investigating) or simply because the base model is not stable per se,
Thanks and have a nice day,
recurrent models are often a bit unstable. some things you might try include
- lower the learning rate
- increase the
beta1
parameter in adam
- clip gradients more aggressively (lower
--clip-norm
)
- try different initialization strategies in the neural nets (if, e.g., your nans tend to occur early in training)
- if you have very long sequences you might be running into generic RNN issues, in which case it might also help to use something like truncated backpropagation through time (see, e.g., this link)
Thanks. I will start playing with that