I’m still quite new to Pyro and I have been trying to implement a variety of different models to learn the basics of it but I’ve run into an issue trying to implement a GARCH model. From looking at other implementations online I’ve found one for Stan here. I defined my model to get as close to that implementation as possible and ended up with:

```
def model(rtn_series):
alpha_0 = pyro.sample("alpha_0", dist.HalfNormal(0.1))
alpha_1 = pyro.sample("alpha_1", dist.Uniform(0,1))
beta_1 = pyro.sample("beta_1", dist.Uniform(0,1))
mu = pyro.sample("mu",dist.Normal(0,0.1))
sigma = torch.tensor(0.001)**2 # initial sigma
for t in range(1,len(rtn_series)):
sigma = torch.sqrt(alpha_0 + alpha_1*(rtn_series[t-1]-mu)**2 + beta_1*sigma**2)
pyro.sample(f'obs_{t}', dist.Normal(mu, sigma), obs=rtn_series[t])
adam_params = {"lr": 0.01}
optimizer = Adam(adam_params)
# setup the inference algorithm
guide = AutoDelta(model)
svi = SVI(model, guide, optimizer, loss=Trace_ELBO())
```

Now this gives similar to results to Stan if I use MCMC but when I try to use variational techniques it converges to a completely different result. Watching the parameter medians change it’s clear that `beta_1`

is constantly moving *away* from the value given by MCMC. I’ve tried it with `AutoNormal`

, `AutoMultivariateNormal`

and `AutoDelta`

and they all exhibit this behaviour. I’ve tried normalising the data, removing the constraint on `beta_1`

(i.e. by swapping `alpha_1`

and `beta_1`

around which didn’t help, and then tried changing `dist.Uniform(0,1-alpha_1)`

to ` dist.Uniform(0,1)`

) but nothing seems to give anything close to the MCMC result.

I’m wondering if there’s anything obvious I’m missing. I understand that this might be an inefficient way to implement the model but I’m not sure which direction is the best way to go.