Transfer SVI,NUTS and MCMC to GPU(Cuda)

artistworking · January 22, 2019, 2:44pm

Hi!

I am having some trouble understanding how to transfer the following script structure to the GPU: This is pseudocode

def model(data):
     #Custom made model
def estimate_MAP(data):
    guide = AutoDelta()
    elbo = Trace_ELBO()
    optim=AdagradRMSProp()
    svi = svi(model,guide,optim,elbo)
    return svi.exec_traces
def MCMC_NUTS():
     NUTS_kernel = NUTS(model)
     NUTS_trace = estimate_MAP(data)
     mcmc = MCMC(NUTS_kernel)
     #Extract the marginal mean and variance of the parameters

I hope the general idea of the code is understood, again, is pesudocode.

I have read several threads on this pyro forum related to this topic (and also the information in pytorch about cuda tensors and their forum), but perhaps I did not understand something because is not working.

Firstly I have directed the torch.tensors troughtout the script to the GPU via .cuda().
Secondly, I have tried, at the beginning of the script, to assign the default tensor such as torch.set_default_tensor_type(‘torch.cuda.DoubleTensor’) when cuda is available. I have several types of torch.cuda tensors with different errors:
a) torch.set_default_tensor_type(‘torch.cuda.DoubleTensor’)
RuntimeError: expected type torch.cuda.DoubleTensor but got torch.cuda.FloatTensor
b) torch.set_default_tensor_type(‘torch.cuda.FloatTensor’)
RuntimeError: expected type torch.FloatTensor but got torch.cuda.FloatTensor

Therefore, I sense that I have to somehow send the model, the svi or everything to work in the GPU, but obviously they don’t have the attribute .cuda() (or similar). Is this possible? Am I missing something?

Thanks for your attention and help

neerajprad · January 22, 2019, 9:47pm

You shouldn’t have to do anything more than that, provided your data is on the GPU, which it seems like it is.

For (a), if you are reading in a numpy array, it will read in as a torch.double tensor by default, so you might have to change that.

(b) shouldn’t happen, could you paste the complete error trace?

artistworking · January 23, 2019, 11:29am

First, Thanks for your reply

I think I got pass that first error (I had to .cuda() another parameter), but thanks for confirming that I only had to do those 2 things to transfer to the GPU the model.

Now I am into another error, which makes less sense to me than the first one. I will start trying to fix it now, in the mean time I am happy to hear suggestions:

Traceback (most recent call last):
  File "Superposition_Bayesian_Cuda.py", line 473, in <module>
    T1, T2, R, M, X1, X2 = Run(data_obs, average)
  File "Superposition_Bayesian_Cuda.py", line 307, in Run
    nuts_kernel.initial_trace = _get_initial_trace(data_obs, average)
  File "Superposition_Bayesian_Cuda.py", line 299, in _get_initial_trace
    svi_engine.run([data_obs], max_epochs=10000)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 326, in run
    self._handle_exception(e)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 291, in _handle_exception
    raise e
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 313, in run
    hours, mins, secs = self._run_once_on_dataset()
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 280, in _run_once_on_dataset
    self._handle_exception(e)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 291, in _handle_exception
    raise e
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/ignite/engine/engine.py", line 272, in _run_once_on_dataset
    self.state.output = self._process_function(self, batch)
  File "Superposition_Bayesian_Cuda.py", line 64, in _update
    return -engine.svi.step(batch, **self._step_args)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/infer/svi.py", line 99, in step
    loss = self.loss_and_grads(self.model, self.guide, *args, **kwargs)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/infer/trace_elbo.py", line 125, in loss_and_grads
    for model_trace, guide_trace in self._get_traces(model, guide, *args, **kwargs):
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/infer/elbo.py", line 163, in _get_traces
    yield self._get_trace(model, guide, *args, **kwargs)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/infer/trace_elbo.py", line 52, in _get_trace
    "flat", self.max_plate_nesting, model, guide, *args, **kwargs)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/infer/enum.py", line 44, in get_importance_trace
    graph_type=graph_type).get_trace(*args, **kwargs)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py", line 169, in get_trace
    self(*args, **kwargs)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py", line 153, in __call__
    traceback)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/six.py", line 692, in reraise
    raise value.with_traceback(tb)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py", line 147, in __call__
    ret = self.fn(*args, **kwargs)
  File "/isdata/fonsecagrp/hlb580/Miniconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py", line 27, in _wraps
    return fn(*args, **kwargs)
  File "Superposition_Bayesian_Cuda.py", line 248, in model
    M_R2_T2 = M@R + T2
RuntimeError: Expected object of backend CPU but got backend CUDA for argument #2 'mat2'
Trace Shapes:       
 Param Sites:       
Sample Sites:       
      M1 dist   | 71
        value   | 71
      M2 dist   | 71
        value   | 71
      M3 dist   | 71
        value   | 71
      T1 dist 3 |   
        value 3 |   
      T2 dist 3 |   
        value 3 |   
  ri_vec dist 3 |   
        value 3 |

neerajprad · January 23, 2019, 6:48pm

I suppose your matrices M and R are not on the same device, one is on CPU and the other on CUDA.

artistworking · January 24, 2019, 1:45pm

Yeah. thanks ! I thought the same. I have been talking to some people and the errors might be because the GPU I am trying to use is deprecated so I will set the whole thing up in a new server/GPU. Thanks for replies

EDIT: The error was in the observed data, I forgot to .cuda() it (In can help someone)

vanAmsterdam · May 24, 2019, 5:43pm

Hi, as a follow-up is there an easy way to get the traces to cpu? I want to use arviz to do some diagnostics on the traces. Works perfectly fine when the mcmc-trace is on cpu, but when you do mcmc on gpu, all the tensors packeged somewhere in the trace object are on gpu and this will make arviz throw an error. I’m looking for something high-level such as:

trace_mcmc = MCMC(... ... data on cuda)
trace_mcmc.cpu()