Linear Dynamic System Modeling, Run Speed

I had a similar goal (Gaussian univariate state with unconventional multivariate measurement error). I found that using the lax.scan function with MCMC on numpyro was by far the fastest and could get close to the true parameters in a simulation study. A lot of the speedup actually came from passing the vectorized noise to the scan function, rather than to add the noise inside the transition function (although I might have been doing things wrong, I am new to (num)pyro).

Details and code are here: