Implementing a GARCH model in Pyro, MCMC v.s SVI

jdwds · March 8, 2022, 7:01pm

I’m still quite new to Pyro and I have been trying to implement a variety of different models to learn the basics of it but I’ve run into an issue trying to implement a GARCH model. From looking at other implementations online I’ve found one for Stan here. I defined my model to get as close to that implementation as possible and ended up with:

def model(rtn_series):
    alpha_0 = pyro.sample("alpha_0", dist.HalfNormal(0.1))
    alpha_1 = pyro.sample("alpha_1", dist.Uniform(0,1))
    beta_1 = pyro.sample("beta_1", dist.Uniform(0,1))
    mu = pyro.sample("mu",dist.Normal(0,0.1))

    sigma = torch.tensor(0.001)**2 # initial sigma
    for t in range(1,len(rtn_series)):
      sigma = torch.sqrt(alpha_0 + alpha_1*(rtn_series[t-1]-mu)**2 + beta_1*sigma**2)
      pyro.sample(f'obs_{t}', dist.Normal(mu, sigma), obs=rtn_series[t])


adam_params = {"lr": 0.01}
optimizer = Adam(adam_params)

# setup the inference algorithm
guide = AutoDelta(model)
svi = SVI(model, guide, optimizer, loss=Trace_ELBO())

Now this gives similar to results to Stan if I use MCMC but when I try to use variational techniques it converges to a completely different result. Watching the parameter medians change it’s clear that beta_1 is constantly moving away from the value given by MCMC. I’ve tried it with AutoNormal, AutoMultivariateNormal and AutoDelta and they all exhibit this behaviour. I’ve tried normalising the data, removing the constraint on beta_1 (i.e. by swapping alpha_1 and beta_1 around which didn’t help, and then tried changing dist.Uniform(0,1-alpha_1) to dist.Uniform(0,1)) but nothing seems to give anything close to the MCMC result.

I’m wondering if there’s anything obvious I’m missing. I understand that this might be an inefficient way to implement the model but I’m not sure which direction is the best way to go.

martinjankowiak · March 8, 2022, 7:27pm

variational inference generally performs poorly for time series problems because there are often a lot of correlations and you need to work hard to include those in your posterior approximation if you want any hope of getting something that constitutes a reasonably good approximation of the true posterior. this usually results in a difficult optimization problem.

variational inference can perform arbitrarily badly on some models just like mcmc can perform arbitrarily badly on some models. personally i wouldn’t suggest putting too much time into getting variational inference to kinda work in this class of models, especially if this is just a learning exercise

jdwds · March 8, 2022, 8:04pm

Thanks for the quick answer! I suspected that it might be a situation like that, though it’s helpful knowing this since I can use it in the future to pick which inference method is likely to be best suited for a particular problem.