I’m trying to do (conditional) maximum likelihood inference for the parameters in a simple non-bayesian AR(2) model in vanilla PyTorch before adapting to the Bayesian setting with Pyro. I’m having trouble getting it to reproduce the results I get from the
statsmodels.tsa.ar_model class. I’m trying to infer the model parameters that will be the best fit to a single, relatively short time series of 500 time points.
I made up some synthetic data generated from an AR model and I’m trying to infer the regression coefficient parameters, the constant, and the variance using maximum likelihood in PyTorch.
Here’s my model:
class AR2_full(nn.Module): #returns entire sequence def __init__(self, data, p=2, init=None): super(AR2_full, self).__init__() self.p = p self.data = data.detach() self.data.requires_grad = False self.init = nn.Parameter(self.data[0:self.p]) self.phi = nn.Parameter(to.randn(self.p)/ 10) self.const = nn.Parameter(to.randn(1)/10) self.var = nn.Parameter(to.tensor([1.])) self.lastx = to.randn(p)/10. N,p= self.data.shape, self.p def set_lastx(self,data): #used for initialization only with to.no_grad(): for i in range(data.shape): self.lastx[i] = data[i] def append_lastx(self,y_new): with to.no_grad(): for i in range(self.lastx.shape-1): self.lastx[i+1] = self.lastx[i] self.lastx = y_new def loglikelihood(self): N,p = self.data.shape,self.p logprob = to.Tensor([0.]) logprob.requires_grad=True for i in range(p,N,1): data = self.data[i-p:i] loc = self.phi @ data + self.const l = to.distributions.Normal(loc=loc,scale=self.var) logprob = logprob + l.log_prob(self.data[i]) return logprob def forward(self): noise = to.distributions.Normal(loc=0,scale=self.var).sample() lastx = to.stack(list(self.lastx)).detach() y = (self.phi @ lastx) + noise + self.const self.append_lastx(y.detach()) return y
And here’s my training loop:
ar2 = AR2_full(gendata3,p=2) optim = to.optim.SGD(lr=1e-4,params=ar2.parameters()) losses =  n_steps = 100 for step in range(n_steps): optim.zero_grad() with to.no_grad(): #reset initial values ar2.set_lastx(ar2.init) loss = -1*ar2.loglikelihood() loss.backward() losses.append(loss.detach().numpy()) optim.step() cur_step = cur_step + 1 if cur_step < 3 else 0
It converges rapidly after about 30 steps and parameters learned are close to the ones obtained from the statsmodels library but not close enough as it performs significantly worse. The statsmodels AR model can perfectly fit the training time series, whereas my model is definitely doing better than random but still looks like a very sloppy fit.
Any help is much appreciated, I have little experience with time series models but trying to learn. I was originally trying this in Pyro but couldn’t get it to work so dropped down to normal PyTorch.