I’m trying to do (conditional) maximum likelihood inference for the parameters in a simple non-bayesian AR(2) model in vanilla PyTorch before adapting to the Bayesian setting with Pyro. I’m having trouble getting it to reproduce the results I get from the `statsmodels.tsa.ar_model`

class. I’m trying to infer the model parameters that will be the best fit to a single, relatively short time series of 500 time points.

I made up some synthetic data generated from an AR model and I’m trying to infer the regression coefficient parameters, the constant, and the variance using maximum likelihood in PyTorch.

Here’s my model:

```
class AR2_full(nn.Module): #returns entire sequence
def __init__(self, data, p=2, init=None):
super(AR2_full, self).__init__()
self.p = p
self.data = data.detach()
self.data.requires_grad = False
self.init = nn.Parameter(self.data[0:self.p])
self.phi = nn.Parameter(to.randn(self.p)/ 10)
self.const = nn.Parameter(to.randn(1)/10)
self.var = nn.Parameter(to.tensor([1.]))
self.lastx = to.randn(p)/10.
N,p= self.data.shape[0], self.p
def set_lastx(self,data): #used for initialization only
with to.no_grad():
for i in range(data.shape[0]):
self.lastx[i] = data[i]
def append_lastx(self,y_new):
with to.no_grad():
for i in range(self.lastx.shape[0]-1):
self.lastx[i+1] = self.lastx[i]
self.lastx[0] = y_new
def loglikelihood(self):
N,p = self.data.shape[0],self.p
logprob = to.Tensor([0.])
logprob.requires_grad=True
for i in range(p,N,1):
data = self.data[i-p:i]
loc = self.phi @ data + self.const
l = to.distributions.Normal(loc=loc,scale=self.var)
logprob = logprob + l.log_prob(self.data[i])
return logprob
def forward(self):
noise = to.distributions.Normal(loc=0,scale=self.var).sample()
lastx = to.stack(list(self.lastx)).detach()
y = (self.phi @ lastx) + noise + self.const
self.append_lastx(y.detach())
return y
```

And here’s my training loop:

```
ar2 = AR2_full(gendata3,p=2)
optim = to.optim.SGD(lr=1e-4,params=ar2.parameters())
losses = []
n_steps = 100
for step in range(n_steps):
optim.zero_grad()
with to.no_grad(): #reset initial values
ar2.set_lastx(ar2.init)
loss = -1*ar2.loglikelihood()
loss.backward()
losses.append(loss.detach().numpy())
optim.step()
cur_step = cur_step + 1 if cur_step < 3 else 0
```

It converges rapidly after about 30 steps and parameters learned are close to the ones obtained from the statsmodels library but not close enough as it performs significantly worse. The statsmodels AR model can perfectly fit the training time series, whereas my model is definitely doing better than random but still looks like a very sloppy fit.

Any help is much appreciated, I have little experience with time series models but trying to learn. I was originally trying this in Pyro but couldn’t get it to work so dropped down to normal PyTorch.