How to compare Pyro GP classification models?

Dear all,

I implemented several Gaussian process classification models using SVI, but I am a little confused about how to compare them. Currently I use

f_loc, f_scale = gpc(X_test)
for i in range(1000):
  pred[:,i] = gpc.likelihood(f_loc, f_scale)
p_test= pred.mean(axis = 0)

to get predictive probability and use 0.5 as cut point for classification, then use it to get something like accuracy like this:

from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix
print("confusion matrix: \n",confusion_matrix(y_test.cpu(),y_pred))

However, is there any metrics that are more “scientific” or research-oriented, e.g. BIC or DIC, can be used here? I am quite new to SVI and Pyro, I’d appreciate your suggestions.