Parameter estimation of differential/difference equation

Hello everyone,

Here is an bird’s eye view of what I am trying to do with my specific question coming in the second paragraph below. I am simulating a dynamical system using differential (or difference equations) with the goal of estimating the parameters of the equations. In reality, we don’t know the exact value of these parameters and we treat them as random variables. My goal is to derive a posterior distribution for these parameters given their prior distribution and a set of measurements. As a starting point I am working with the following system which uses the difference and measurement equations below

\begin{equation} \begin{aligned} x_t &= a \cdot x_{t-1} \\ y_t &= x_t + \nu \,, \end{aligned} \end{equation}

where the parameter a \sim \mathcal{N} (\mu, \sigma_a^2) is normally distributed and is the parameter we care to estimate. x_t is the unknown state at time t, y_t is the measurement at time t, and \nu \sim \mathcal{N} (0, \sigma^2) is the measurement noise, which is zero mean normally distributed noise with variance \sigma^2. I have created the Jupyter notebook markov_model.ipynb and the utilities.py with all the helper functions. Both can be found at my GitHub repo. The pipeline is the following:

  1. Generate data with the true generating process, and
  2. Use stochastic variational inference to estimate the posterior of the parameter.

For the guide function, I am using AutoNormal and for the loss function I am using TraceELBO. Even though the specific example is linear and as far as I understand there are already implementations in Pyro that deal with linear models with Gaussian noise, my goal is to work with nonlinear dynamical systems with non-Gaussian distributions. Also, I think it is important for any researcher to have some understanding of what is happening under the hood of all the software machinery they are using.

The problem is the following: the loss function does not decrease as a function of the training epoch and its really noisy as can be seen in the image below. Most importantly, the posterior has pretty much the same mean as the prior and I have a feeling if I let the script run for more epochs it will end up having the same variance as well. This prevents me from making any meaningful statements about the parameter involved in the difference equation above. Could someone look at my code and see where I might have gone wrong? Any help to resolve this issue will be highly appreciated.

In your code, I’m not sure what the .clone.detach() is doing. Maybe that breaks the gradients, idk

The model you described can be thought of as a (1D) state space model with no dynamics model noise and some emissions model noise.

There’s a lot of case studies in the pyro docs for this and a lot of support in the library. See Forecasting with Dynamic Linear Model (DLM) — Pyro Tutorials 1.9.1 documentation and Kalman Filter — Pyro Tutorials 1.9.1 documentation (more general), Forecasting II: state space models — Pyro Tutorials 1.9.1 documentation . There’s more

In numpyro (pyro has no scan I don’t think), I’d write it down like this (untested)

import jax.numpy as jnp
import numpyro
import numpyro.distributions as dist
from numpyro.contrib.control_flow import scan


def model(y=None, T=100, x_init=1.0):
    a = numpyro.sample("a", dist.Normal(1.0, 0.3))
    sigma_obs = numpyro.sample("sigma_obs", dist.HalfNormal(0.1))

    def transition_fn(carry, t):
        x_prev = carry

        x = a * x_prev

        pred = numpyro.sample("y", dist.Normal(x, sigma_obs))

        return x, pred

    with numpyro.handlers.condition(data={"y": y}):
        _, preds = scan(transition_fn, x_init, jnp.arange(T))

    return preds

You can still do SVI

guide = AutoNormal(model)
optimizer = numpyro.optim.Adam(step_size=0.005)
svi = SVI(model, guide, optimizer, loss=Trace_ELBO())

svi_result = svi.run(
    jax.random.PRNGKey(1), 
    num_steps=2000,
    y=y_obs, T=T, x_init=x_init,
)

If everything is Gaussian, you can get really efficient and use Kalman filtering for the log likelihood and use factor to include the term. But if you’re going to use different distributions around the dynamics/emissions models, the scan form above is the easiest code.

Cheers