I am trying to implement a Gaussian process latent variable model (gplvm) for categorical data. The model is the one presented in Section 3, Eq. (1) in this paper, which is basically the usual GP-LVM, but with categorical likelihood, which would require using pyro’s VariationalSparseGP.
The gplvm code below uses a Gaussian likelihood (despite the integer-valued categorical data):
import numpy as np import pyro import torch import pyro.contrib.gp as gp import pyro.distributions as dist N = 80 # number of sequences C = 4 # number of classes for each element of the sequence L = 10 # length of each sequence d = 2 # GP-LVM latent space dimension n = 20 # number of inducing points y = torch.zeros(N, L).int() for l in range(L): y[:, l] = dist.Categorical(torch.rand(C)).sample((N,)) X_init = torch.zeros(N, d) kernel = gp.kernels.RBF(input_dim=d, lengthscale=torch.ones(d)) Xu = torch.zeros(n, d) + torch.rand(n, d) * 0.01 # initial inducing inputs of sparse model likelihood = gp.likelihoods.Gaussian() gpmodule = gp.models.VariationalSparseGP(X_init, y.T, kernel, Xu, likelihood=likelihood) gplvm = gp.models.GPLVM(gpmodule) gp.util.train(gplvm) X = gplvm.X plt.scatter(X[:, 0], X[:, 1]) # plot the inferred latent variables
As mentioned, I’d want to use likelihood = gp.likelihoods.MultiClass(num_classes=C). However, simply replacing this doesn’t seem to work, as I get the following error:
ValueError: Number of Gaussian processes should be equal to the number of classes. Expected 4 but got 10. Trace Shapes: Param Sites: base_model.Xu 20 2 base_model.kernel.lengthscale 2 base_model.kernel.variance base_model.u_loc 10 20 base_model.u_scale_tril 10 20 20 Sample Sites: X dist | 80 2 value | 80 2 base_model.u dist | 10 20 value | 10 20
I’ve just started using pyro a few days ago, so I’d appreciate any help you might have on this matter. Thanks!