Natural gradient ascent

Hi @fritzo, thanks for the suggestion. That is an interesting and a very clean solution. I will have a look at the classes and functions you suggested.

The only hurdle I see here is that there might not be a “natural” parametrisation of an arbitrary distribution, even if it belongs to the exponential family. My guess is that such parametrisation only exists for natural exponential family (e.g. normal distribution with known variance, or gamma distribution with known shape).

As long as the inverse fisher information is different from the identity matrix, the Riemanninan metric of the statistical manifold will be non-euclidian. This implies that one has to multiply normal gradient vector estimated over parameters of that distribution with a corresponding Riemannian metric.

For example, if my requirement would be to pass to an optimiser not only gradients but also manifold curvature, what would be an elegant way to solve this? My current solution is to use parameter names, which include the distribution type and compute within the optimiser the corresponding Riemannian metric, that is, gradient multipliers (this works only if the information matrix can be made diagonal). However, I am aware that this is a dirty solution as it does not fit well with the general Pyro structure.