How to debug CUDA out of memory?

Here’s a very general debugging trick, hopefully it helps: A trick to debug tensor memory