I’ve got a model that uses an SVI engine. The dataset has millions of rows and a very complex design. Some of the variables we know will ultimately have a coefficient very close to 0. When we only constrain the coefficient distributions to be positive, we get expected behavior (i.e. the coefficients start very large with the first iteration and then shrink consistently over time until they settle at a value very close to 0, never really fluctuating). However, if we put an upper bound on the distribution that is very small, we see the coefficient mu’s fluctuate throughout the constrained range rather than generally moving in one direction. No matter which way the coefficients are moving, the loss continues to decrease throughout the entire model training. I have a few questions about what could be happening:
- Is there anything in numpyro’s algorithm that encourages exploration of the entire viable solution space? In other words, when we allow the coefficients to start with large values, is the model generally satisfied to let them shrink continually and then settle, but in the highly constrained version it is weighted in such a way to explore all possible solutions?
- Is there any danger in allowing some coefficients to be unconstrained and forcing others to be tightly constrained?
- Changing step sizes in the optimizer or batch size have not made a difference. Are there other parameters that might help the model navigate such small constraints in a more systematic way?
- The loss tends to go negative in both versions of the model (although not at a point in training that correlates with any noticeable trend, such as swings in coefficient values). What is this indicating and is it an issue?
Thanks in advance for your help, numpyro is a great tool, and I am always looking to improve my understanding and how to best utilize it.