Prediction for gaussian process classification

yusri_dh · May 20, 2018, 6:23am

Hi all, when I perform gaussian process for classification with this code below:

import torch
import torch.nn.functional as F
import numpy as np
import matplotlib.pyplot as plt
import time
import pyro
import pyro.contrib.gp as gp
import pyro.distributions as dist
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam
from sklearn.metrics import accuracy_score
from sklearn.datasets import make_classification

#Create dataset
N = 200
N_TRAIN = int(0.80 * N)
N_FEATURES = 7
X,y = make_classification(n_samples=N, n_features=N_FEATURES,n_redundant=0,n_repeated=0,n_clusters_per_class = 2, n_classes=2)
X_train = X[:N_TRAIN,:]
y_train =y[:N_TRAIN]
X_test = X[N_TRAIN:,:]
y_test =y[N_TRAIN:]
X_train = torch.tensor(X_train,dtype= torch.float32)
y_train = torch.tensor(y_train, dtype=torch.float32)
X_test = torch.tensor(X_test,dtype= torch.float32)
y_test = torch.tensor(y_test, dtype=torch.float32)

# Choose kernel and likelihood
kernel = gp.kernels.RBF(input_dim = N_FEATURES, variance = torch.tensor(1.),lengthscale = torch.tensor(10.))
likelihood = gp.likelihoods.Binary()
gpc = gp.models.VariationalGP(X_train,y_train,kernel=kernel,jitter = 1e-03, likelihood=likelihood,whiten = 
                             True)

# Inference
optim = Adam({"lr":0.005})
svi = SVI(gpc.model,gpc.guide,optim,loss=Trace_ELBO())
num_steps = 2000 
losses =np.zeros(num_steps)
pyro.clear_param_store()
start =time.time()
for i in range(num_steps):
    losses[i]=svi.step()
    if i %(num_steps//20) ==0:
        print("iteration %d. Loss %.4f" % (i,losses[i]))
        elapsed_time = time.time()
        print("elapsed time: %.2f" %(elapsed_time-start))
end = time.time()
print("Loop take time %.2f"%(end-start))
plt.plot(losses)
plt.show()

What is the appropriate way to get prediction for classification?

I am trying this way:

# Method 1
f_loc, f_var = gpc(X_train, full_cov=True)
prediction = gpc.likelihood(f_loc,f_var)
pred= np.round(np.mean(prediction.numpy(),axis =0))
train_acc = accuracy_score(pred,y_train.numpy())
f_loc, f_var = gpc(X_test, full_cov=True)
prediction = gpc.likelihood(f_loc,f_var)
pred= np.round(np.mean(prediction.numpy(),axis =0))
test_acc = accuracy_score(pred,y_test.numpy())
print("Train accuracy: %.3f, Test accuracy: %.3f" %(train_acc,test_acc))

and this way:

# Method 2
f_loc, _ = gpc(X_train, full_cov=True)
prediction = np.zeros([300,N_TRAIN])
for i in range(300):
    prediction_sample = dist.Bernoulli((F.sigmoid(f_loc)))().numpy()
    prediction[i,:]= prediction_sample
pred= np.round(np.mean(prediction,axis =0))
train_acc = accuracy_score(pred,y_train.numpy())

f_loc, _ = gpc(X_test, full_cov=True)
prediction = np.zeros([300,N-N_TRAIN])
for i in range(300):
    prediction_sample = dist.Bernoulli((F.sigmoid(f_loc)))().numpy()
    prediction[i,:]= prediction_sample
pred= np.round(np.mean(prediction,axis =0))
test_acc = accuracy_score(pred,y_test.numpy())
print("Train accuracy: %.3f, Test accuracy: %.3f" %(train_acc,test_acc))

Which way is better and why?

fehiepsi · May 21, 2018, 6:52am

I would follow method 1 because it used uncertainty information from the latent GP. You can also put the inference using method1 in a loop like this:

for i in range(300):
    prediction[i, :] = gpc.likelihood(f_loc,f_var)

yusri_dh · May 21, 2018, 8:13am

Thank you for your answer @fehiepsi. In methods 1:

f_loc, f_var = gpc(X_test, full_cov=True)
prediction = gpc.likelihood(f_loc,f_var)

the shape of prediction always the number of samples * the number of samples (X_test[0], X_test[0]). So if the number of samples in X_test(for example only 3) is not big, the calculation of pred in

might be not so accurate. So based on your suggestion, I think:

# Method 3
n_sampling = 300
N_TEST = N - N_TRAIN
prediction = np.zeros([n_sampling, N_TEST, N_TEST])
f_loc, f_var = gpc(X_test, full_cov=True)
for i in range(n_sampling):
    prediction[i,:,:]= gpc.likelihood(f_loc,f_var)
pred= np.round(np.mean(prediction,axis =(0,1)))
test_acc = accuracy_score(pred,y_test.numpy())

might be the appropriate way.

fehiepsi · May 22, 2018, 12:56am

@yusri_dh It seems that there is something wrong here. I don’t get why “the shape of prediction always the number of samples * the number of samples”? I thought that the shape of prediction is the same with the shape of f_loc, which is the number of samples. If that is the case, then there might be some bugs in GP module code.

yusri_dh · May 22, 2018, 1:45am

Actually, it makes me wonder too. I expect that the resulting shape will be same as the shape of f_loc too. But from the source code:

is same as calling:

f = dist.Normal(f_loc,f_var)()
f_res = torch.nn.functional.sigmoid(f)
prediction = dist.Bernoulli(f_res)()

so the prediction shape become the number of samples * the number of samples. In my opinion, instead of “f = dist.Normal(f_loc,f_var)()” from the source it should be “f = MultivariateNormal(f_loc,f_var)()” so it become like this:

f = dist.MultivariateNormal(f_loc,f_var).sample()
f_res = torch.nn.functional.sigmoid(f)
prediction = dist.Bernoulli(f_res)()
prediction

In this case the shape of prediction will be same as f_loc. How do you think?

fehiepsi · May 22, 2018, 5:14am

Good catch! We should use the flag ‘fullcov = False’ for the prediction because the output is independent given latent GP output. No need to use Multivariate here.

yusri_dh · May 22, 2018, 7:41am

Ah, now I understand. It is because I put “fullcov = True”. Thank you so much.
I am sorry for spamming the pyro forum lately and it might still continue because I think pyro is amazing for my project.