Figured out what the problem was. It had nothing to do with pyro, I was just being a silly billy and apparently if you call .cuda() on a variable instead of the tensor, it is consider an operation and is a “new” variable and can cause problems.
So:
Variable(torch.tensor(10)).cuda() #this is bad
Variable(torch.tensor(10).cuda()) #this is "good"
I think this link describes it a bit: Strange behavior of Variable.cuda() and Variable.grad - PyTorch Forums