GPU Support in pyro.contrib.gp?

Sunny · April 1, 2023, 1:41pm

I’m trying to use GPUs via Google Colaboratory for FBGPR model (see below). But it seems like it’s not possible to use the Pyro GP API using GPU. I have tried to use CPU, but then I get a UserWarning saying UserWarning: num_chains=2 is more than available cpu=1—which by itself does not make much sense.

def f(x):
    return torch.sin(20.0*x) + 2.0*torch.cos(14.0*x) - 2.0*torch.sin(6.0*x)

xs = torch.tensor([-1.0,-0.5,0.0,0.5,1.0]).to(device)
ys = f(xs).to(device)

# Defining the FBGPR model.

pyro.set_rng_seed(1)

kernel = gp.kernels.RBF(input_dim=1)

kernel.lengthscale = pyro.nn.PyroSample(LogNormal(-1.0,1.0))

kernel.variance = pyro.nn.PyroSample(LogNormal(0.0,2.0))

# The hyperparameters is fixed.
noise = torch.tensor(0.0001).to(device)

gpr = gp.models.GPRegression(xs, ys, kernel, noise = noise)


pyro.set_rng_seed(1)

# Define a NUTS kernel for the GP regression model
nuts_kernel = pyro.infer.NUTS(gpr.model,jit_compile=True)

# Define an MCMC inference algorithm
mcmc = pyro.infer.MCMC(nuts_kernel, num_samples=500, num_chains=2, warmup_steps=500)

# Run the MCMC algorithm
mcmc.run()

# Extract posterior samples for kernel.lengthscale and kernel.variance
ls_name = "kernel.lengthscale"
posterior_ls = mcmc.get_samples()[ls_name]
vs_name = "kernel.variance"
posterior_vs = mcmc.get_samples()[vs_name]

# Extract all posterior samples
posterior_hyperparameter_samples = mcmc.get_samples()

I get the following error

/usr/local/lib/python3.9/dist-packages/pyro/infer/mcmc/api.py:497: UserWarning: num_chains=2 is more than available_cpu=1. Chains will be drawn sequentially.
  warnings.warn(
Warmup:   0%|          | 0/1000 [00:00, ?it/s]
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/pyro/poutine/trace_messenger.py in __call__(self, *args, **kwargs)
    173             try:
--> 174                 ret = self.fn(*args, **kwargs)
    175             except (ValueError, RuntimeError) as e:

15 frames
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
/usr/local/lib/python3.9/dist-packages/pyro/contrib/gp/models/gpr.py in model(self)
     86         N = self.X.size(0)
     87         Kff = self.kernel(self.X)
---> 88         Kff.view(-1)[:: N + 1] += self.jitter + self.noise  # add noise to diagonal
     89         Lff = torch.linalg.cholesky(Kff)
     90 

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cpu!
          Trace Shapes:  
           Param Sites:  
                  noise 1
          Sample Sites:  
kernel.lengthscale dist |
                  value |
   kernel.variance dist |
                  value |

fehiepsi · April 1, 2023, 2:37pm

I think you can inspect the error to see which tensor is on cpu. I suspect you need to cast your priors to gpu, or something else.

Sunny · April 1, 2023, 9:53pm

Thank you for the reply.
Unfortunately, it did not work (at least I couldn’t figure it out).

fehiepsi · April 2, 2023, 2:45am

Can you use python debugger to stop at where the error happens? If so, you can inspect variables around the line 88 above to see which one is on cpu. If self.kernel.variance is on cpu, it means that your prior parameters need to be on gpu. For example, replacing parameters -1.0,1.0 of LogNormal prior by some cuda tensors.

Sunny · April 2, 2023, 1:17pm

Okay, I see what you mean.