Training an Autoregressive AR(2) model

I’m trying to do (conditional) maximum likelihood inference for the parameters in a simple non-bayesian AR(2) model in vanilla PyTorch before adapting to the Bayesian setting with Pyro. I’m having trouble getting it to reproduce the results I get from the statsmodels.tsa.ar_model class. I’m trying to infer the model parameters that will be the best fit to a single, relatively short time series of 500 time points.

I made up some synthetic data generated from an AR model and I’m trying to infer the regression coefficient parameters, the constant, and the variance using maximum likelihood in PyTorch.

Here’s my model:

class AR2_full(nn.Module): #returns entire sequence
    def __init__(self, data, p=2, init=None):
        super(AR2_full, self).__init__()
        self.p = p
        self.data = data.detach()
        self.data.requires_grad = False
        self.init = nn.Parameter(self.data[0:self.p])
        self.phi = nn.Parameter(to.randn(self.p)/ 10)
        self.const = nn.Parameter(to.randn(1)/10)
        self.var = nn.Parameter(to.tensor([1.]))
        self.lastx = to.randn(p)/10.
        N,p= self.data.shape[0], self.p
    
    def set_lastx(self,data): #used for initialization only
        with to.no_grad():
            for i in range(data.shape[0]):
                self.lastx[i] = data[i]
            
    def append_lastx(self,y_new):
        with to.no_grad():
            for i in range(self.lastx.shape[0]-1):
                self.lastx[i+1] = self.lastx[i]
            self.lastx[0] = y_new
    
    def loglikelihood(self):
        N,p = self.data.shape[0],self.p
        logprob = to.Tensor([0.])
        logprob.requires_grad=True
        for i in range(p,N,1):
            data = self.data[i-p:i]
            loc = self.phi @ data + self.const
            l = to.distributions.Normal(loc=loc,scale=self.var)
            logprob = logprob + l.log_prob(self.data[i])
        return logprob
    
    def forward(self):
        noise = to.distributions.Normal(loc=0,scale=self.var).sample()
        lastx = to.stack(list(self.lastx)).detach()
        y = (self.phi @ lastx) + noise + self.const
        self.append_lastx(y.detach())
        return y

And here’s my training loop:

ar2 = AR2_full(gendata3,p=2)
optim = to.optim.SGD(lr=1e-4,params=ar2.parameters())
losses = []

n_steps = 100
for step in range(n_steps):
    optim.zero_grad()
    with to.no_grad(): #reset initial values
        ar2.set_lastx(ar2.init)
    loss = -1*ar2.loglikelihood()
    loss.backward()
    losses.append(loss.detach().numpy())
    optim.step()
    cur_step = cur_step + 1 if cur_step < 3 else 0

It converges rapidly after about 30 steps and parameters learned are close to the ones obtained from the statsmodels library but not close enough as it performs significantly worse. The statsmodels AR model can perfectly fit the training time series, whereas my model is definitely doing better than random but still looks like a very sloppy fit.

Any help is much appreciated, I have little experience with time series models but trying to learn. I was originally trying this in Pyro but couldn’t get it to work so dropped down to normal PyTorch.