Bad results for mutilclassification using VGP

Hi all, I was trying to use the VGP combined with MultiClass likelihood for a classification problem in my project. First I tried it on a toy dataset, but the results are bad, here is the code:

#!/usr/bin/env python
import torch
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
import time
import pyro
import pyro.contrib.gp as gp
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification, make_blobs
from scipy import stats
import os
smoke_test = ('CI' in os.environ)  # ignore; used to check code integrity in the Pyro repo
pyro.enable_validation(True)       # can help with debugging
pyro.set_rng_seed(0)

if __name__ == '__main__':
    # from sklearn.model_selection import train_test_split
    # from sklearn import datasets
    #Create dataset
    N = 200
    N_TRAIN = int(0.80 * N)
    N_FEATURES = 7
    num_class = 3
    #X, y =  make_classification(n_samples = 200, n_features=N_FEATURES, n_redundant=0, n_informative=2,
    #                            n_clusters_per_class=2)
    X, y =  make_blobs(n_samples=200, n_features=N_FEATURES, centers=num_class)
    plt.scatter(X[:, 0], X[:, 1], marker='o', c=y,
                   s=25, edgecolor='k')
    plt.show()
    ###################


    ###################
    X_train = X[:N_TRAIN,:]
    y_train =y[:N_TRAIN]
    X_test = X[N_TRAIN:,:]
    y_test =y[N_TRAIN:]
    X_train = torch.tensor(X_train,dtype= torch.float32)
    y_train = torch.tensor(y_train, dtype=torch.float32)
    X_test = torch.tensor(X_test,dtype= torch.float32)
    y_test = torch.tensor(y_test, dtype=torch.float32)

    #encode one-hot vector
    y_train_label = y[:N_TRAIN]
    y_test_label = y[N_TRAIN:]
    y_train_enc = torch.zeros(num_class, y_train.shape[0]).double()
    y_test_enc = torch.zeros(num_class, y_test.shape[0]).double()
    y_train_enc.scatter_(0, torch.LongTensor(y_train_label).view(1, -1), 1)
    y_test_enc.scatter_(0, torch.LongTensor(y_test_label).view(1, -1), 1)


   # Choose kernel and likelihood for multiclassification
   # set the dim of lenthscale to input_dim to use the "isotropic" version
   kernel = gp.kernels.RBF(input_dim = N_FEATURES, variance = torch.tensor(1.),lengthscale =        torch.ones(N_FEATURES))
   likelihood = gp.likelihoods.MultiClass(3)
   gpc = gp.models.VariationalGP(X_train,y_train_enc,kernel=kernel,jitter = 1e-03, likelihood=likelihood,whiten =
                                                 True)

   gpc.optimize()

   # Infer the posterior via sampling
   f_loc, f_var = gpc(X_train, full_cov=False)
   y_train_results = []
   for i in range(0, 1000):
          y_train_results.append(gpc.likelihood(f_loc, f_var).numpy())
   y_train_results = np.array(y_train_results)
   y_train_results = stats.mode(y_train_results, axis=0)[0]
   y_train_results = y_train_results.squeeze(0)
   train_acc = accuracy_score(y_train_results[:],y_train)

   f_loc, f_var = gpc(X_test, full_cov=False)
   y_test_results = []
   for i in range(0, 1000):
        y_test_results.append(gpc.likelihood(f_loc, f_var).numpy())
   y_test_results = np.array(y_test_results)
   y_test_results = stats.mode(y_test_results, axis=0)[0]
   y_test_results = y_test_results.squeeze(0)
   test_acc = accuracy_score(y_test_results,y_test)
   print("Train accuracy: %.3f, Test accuracy: %.3f" %(train_acc,test_acc))

The results are bad:

And even when I tried the same setting for a binary classification problem, by setting num_class = 2, results are also bad:

However, when I changed it to a simpler binary binary classification problem by using gp.likelihoods.Binary():

...
num_class = 2
#X, y =  make_classification(n_samples = 200, n_features=N_FEATURES, n_redundant=0, n_informative=2,
#                            n_clusters_per_class=2)
X, y =  make_blobs(n_samples=200, n_features=N_FEATURES, centers=num_class)
...
likelihood = gp.likelihoods.Binary()
 # Here for gp.likelihoods.Binary(), no need to use one-hot vector
gpc = gp.models.VariationalGP(X_train,y_train,kernel=kernel,jitter = 1e-03, likelihood=likelihood,whiten =
                         True)
...

The optimization results seem reasonably good:

So it looks like the problem is not in the hyper-parameter initialization for the kernel or the optimization parameter setting for the SVI, I am wondering if it’s related to the Multiclass likelihood. Though I checked the implementation in pyro/pyro/contrib/gp/likelihoods/multi_class.py, it seems correct according to section 3.5 of the GPML book.

Did anyone try similar problems ? Advices appreciated.

@yzucla14 Your attempt is really nice! I have used MultiClass to classify MNIST dataset and found that it worked well. I’ll look into this and response to you soon.

Got it. We use Categorical distribution for MultiClass, so you just need to set model’s output is y_train, not y_train_enc. Only do that will throw for you some error. Why? We need GP gives outputs are probabilities for each class, hence GP latent output should have the shape (3 x num_data). So to solve that error, you should also set latent_shape = torch.Size([3]) in your GP construction.

I believe you know what to do based on your well knowledge of GP module. Let’s me know if you find any difficulty. :slight_smile:

1 Like

Hi @fehiepsi,
Thanks very much for your explanation, now I see where I’m wrong…

Initially when I saw this line pyro/pyro/contrib/gp/likelihoods/multi_class.py, I thought we need to set the output to the “one-hot” version so latent function can have 2 dims, one of which corresponds to the num_class.

Now by setting the latent_shape in the model constructor, we’re building that many of uncorrelated GP, one for each class.

Thanks again for your explanation, and I really love your implentation of GP upron pyro!

1 Like