What is the correct way to move data, and probabalistic models to GPU?

Figured out what the problem was. It had nothing to do with pyro, I was just being a silly billy and apparently if you call .cuda() on a variable instead of the tensor, it is consider an operation and is a “new” variable and can cause problems.
So:

 Variable(torch.tensor(10)).cuda() #this is bad 
Variable(torch.tensor(10).cuda()) #this is "good"

I think this link describes it a bit: Strange behavior of Variable.cuda() and Variable.grad - PyTorch Forums