Working with Coregionalize kernel

fonnesbeck · May 12, 2021, 3:53pm

Are there any working examples using the Coregionalize kernel for multi-output? I did not see anything on the repo.

I assume it is supposed to be used like this?

K = gp.kernels.Sum(
            gp.kernels.Matern32(
                input_dim=1, 
                variance=v, 
                lengthscale=ls,
            ),
            gp.kernels.Coregionalize(
                input_dim=1, 
                rank=y.shape[1]
            )
        )

Having done this, how do you access its output? when I call model() on the resulting GPRegression, I only get 1-D output, whereas I was expecting 2-D (where the second dimension is the size of rank).

Any guidance most welcome.

fehiepsi · May 12, 2021, 7:49pm

I think you need to encode your input data with additional one-hot dimensions (quoted from the docs “The typical use case is for modeling correlations among outputs of a multi-output GP, where outputs are coded as distinct data points with one-hot coded features denoting which output each datapoint represents.”) I couldn’t find an example in Pyro but this example in GPFlow illustrates the principle (I guess you can find some similar examples in GPy or other gp frameworks too):

instead of working with (X1, X2, X3, …, Xn) → ((y11, y12, …, y1d), (y21, … y2d), …, (yn1, … ynd))
we will work with (X1 x (1, …, 0), X1 x (0, 1,…), …X1 x (0,…, 1), X2…) → (y11, y12, …, y1d, y21, …)

fonnesbeck · May 12, 2021, 8:50pm

And that one-hot encoded matrix is passed to the Sum kernel as the Z input? I’m familiar with the GPFlow implementation, but it seems very different from this one, in that it simply appends a single categorical variable to the inputs to use as the output dimension index. That does not seem to be happening here.

fehiepsi · May 12, 2021, 10:38pm

You are right, the categorical variable in Pyro version is in one-hot form. The reason is PyTorch doesn’t support concatenating a real tensor and a long tensor.