Gaussian process regression

I am trying to understand the practical implementation of the Gaussian process regression problem and some questions concerning a research paper.
A research paper, “Multi-View Stereo by Temporal Nonparametric Fusion” tries to solve the depth estimation of the input frames by temporally fusing the information from the previous frame by taking the output from the previous latent space encoding that is subjected to the Gaussian process.
Link to paper and code: Multi-View Stereo by Temporal Nonparametric Fusion | ICCV 2019Yuxin Hou, Juho Kannala, and Arno Solin

Initially distance between the poses are calculated with the help of pose distance measure D[P_i, P_j] = \sqrt(||t_i - t_j||^2+2/3(tr(I - R_i^TR_j)). Poses are defined with translation vector t and rotation vector R.

The distance D is used to construct the covariance funtion.
k[P, P'] = \gamma^2(1+\frac{\sqrt(3)D(P,P')}{l})exp(\frac{-\sqrt{3}D[P,P']}{l})

In order to share the temporal information between the frames indpendent GP priors to all values in z_i

z_j(t) = GP(0,k(P[t],P[t']))

y_{j,i} = z_j(t_i)+\epsilon_{j,i}, \epsilon_{j,i} = N(0, \sigma^2)

The model is trained in a batch of 4 in batch approach.


class GPlayer(nn.Module):

    def __init__(self):
        super(GPlayer, self).__init__()

        self.gamma2 = nn.Parameter(torch.randn(1), requires_grad=True).float()
        self.ell = nn.Parameter(torch.randn(1), requires_grad=True).float()
        self.sigma2 = nn.Parameter(torch.randn(1), requires_grad=True).float()

    def forward(self, D, Y):
        :param D: Distance matrix
        :param Y: Stacked outputs from encoder
        :return: Z: transformed latent space
        b,l,c,h,w = Y.size()
        Y = Y.view(b,l,-1).cpu().float()
        D = D.float()

        K = torch.exp(self.gamma2) * (1 + math.sqrt(3) * D / torch.exp(self.ell)) * torch.exp(-math.sqrt(3) * D / torch.exp(self.ell))
        I = torch.eye(l).expand(b, l, l).float()

        X = torch.linalg.solve(Y, K+torch.exp(self.sigma2)*I)

        Z = K.bmm(X)

        Z = F.relu(Z)

        return Z

My question is

  1. As stated in the paper (figure 2) to calculate the output z_i it takes in a. output of encoder b. previous latent space encoding c. camera pose.
    So my question is, by looking at the above implementation code, I can clearly understand the model takes in D and Y, but how the information from previous encoding output z_i is passed onto to calculate z_{i+1}. What I understood is that the information is passed onto the next frame encoder by taking the learned information stored in gamma2, ell, sigma2. Is my understanding is right?

  2. Framing of gaussian process as linear regression problem. In the implementation code the regression problem is framed as solving AX=B.
    i.e here B = Y and A = K+torch.exp(self.sigma2)*I. So, X = BA^{-1}. I am facing difficulty in understand how the Gaussian process regression is framed as solving AX = B. Also why did the author multiplied K with X i.e K.bmm(X)