i believe you would have easily found an answer to this question if you had searched previous forum posts. that is one of the main purposes of the forum.
GPU workloads are generally only faster than CPU workloads when the underlying tensor operations are sufficiently large. this is basically because GPU use incurs additional overhead.
Thanks @martinjankowiak
I was asking as in an other case where the tensors are large and the GPU was also slower than CPU so people in charge of the GPU farm are digging the reason…
Well, It is a VI + NeuraReparametrisation to perform a NUTS. Currently, we have found that the job runs 100% on CPU while GPU memory is activated. it’s as if the CPU was going back and forth with the GPU just to access the CPU’s ram, but the computation instructions on the GPU were not done.
Notice that I have cloned numpyro just to make a x-check on transform.py (diag = jnp.clip(diag, a_min=1e-12)) but it is commented. I wander if cloning Numpyro is a possible explanation of GPU desactivation?