Hi all,
I am trying to implement a Gaussian process latent variable model (gplvm) for categorical data. The model is the one presented in Section 3, Eq. (1) in this paper, which is basically the usual GP-LVM, but with categorical likelihood, which would require using pyro’s VariationalSparseGP.
The gplvm code below uses a Gaussian likelihood (despite the integer-valued categorical data):
import numpy as np
import pyro
import torch
import pyro.contrib.gp as gp
import pyro.distributions as dist
N = 80 # number of sequences
C = 4 # number of classes for each element of the sequence
L = 10 # length of each sequence
d = 2 # GP-LVM latent space dimension
n = 20 # number of inducing points
y = torch.zeros(N, L).int()
for l in range(L):
y[:, l] = dist.Categorical(torch.rand(C)).sample((N,))
X_init = torch.zeros(N, d)
kernel = gp.kernels.RBF(input_dim=d, lengthscale=torch.ones(d))
Xu = torch.zeros(n, d) + torch.rand(n, d) * 0.01 # initial inducing inputs of sparse model
likelihood = gp.likelihoods.Gaussian()
gpmodule = gp.models.VariationalSparseGP(X_init, y.T, kernel, Xu, likelihood=likelihood)
gplvm = gp.models.GPLVM(gpmodule)
gp.util.train(gplvm)
X = gplvm.X
plt.scatter(X[:, 0], X[:, 1]) # plot the inferred latent variables
As mentioned, I’d want to use likelihood = gp.likelihoods.MultiClass(num_classes=C). However, simply replacing this doesn’t seem to work, as I get the following error:
ValueError: Number of Gaussian processes should be equal to the number of classes. Expected 4 but got 10.
Trace Shapes:
Param Sites:
base_model.Xu 20 2
base_model.kernel.lengthscale 2
base_model.kernel.variance
base_model.u_loc 10 20
base_model.u_scale_tril 10 20 20
Sample Sites:
X dist | 80 2
value | 80 2
base_model.u dist | 10 20
value | 10 20
I’ve just started using pyro a few days ago, so I’d appreciate any help you might have on this matter. Thanks!