I am new to deep probabilistic programming, and have worked my way through the tutorials on the Pyro website.

I am currently working on some toy reinforcement learning problems, and was wondering if it would be advantageous to use Pyro to implement the REINFORCE algorithm as it is implemented here in PyTorch (utilising torch.distributions):

If so, how would you go about doing it? I have been trying to wrap my head around doing it with Pyro’s model and guide architecture for SVI, but I cant come up with a way to do it that makes sense. I hope some of you might help me in the right direction.