GPU available but slow compared to CPU on BNN example

I was trying to do the example

But, with a CPU I get 10x more iterations per second than on a V100 GPU !

I have checked that the GPU is seen by the script

[GpuDevice(id=0, process_index=0)]

What could be the source of inefficiency? is there any enrionment variable to x-check? Thanks

hi @campagne

i believe you would have easily found an answer to this question if you had searched previous forum posts. that is one of the main purposes of the forum.

GPU workloads are generally only faster than CPU workloads when the underlying tensor operations are sufficiently large. this is basically because GPU use incurs additional overhead.

so this behavior is expected.

Thanks @martinjankowiak
I was asking as in an other case where the tensors are large and the GPU was also slower than CPU so people in charge of the GPU farm are digging the reason…

What type of approximate inference algorithm you are using? It’s quite common for MCMC to be slower on GPU than CPU.

If you are using VI, then there could be some real underlying issues.

Well, It is a VI + NeuraReparametrisation to perform a NUTS. Currently, we have found that the job runs 100% on CPU while GPU memory is activated. it’s as if the CPU was going back and forth with the GPU just to access the CPU’s ram, but the computation instructions on the GPU were not done.

Here are the conda list of packages

Notice that I have cloned numpyro just to make a x-check on (diag = jnp.clip(diag, a_min=1e-12)) but it is commented. I wander if cloning Numpyro is a possible explanation of GPU desactivation?