REINFORCE algorithm in Pyro

MrVamp · June 23, 2018, 8:45pm

Hello,

I am new to deep probabilistic programming, and have worked my way through the tutorials on the Pyro website.

I am currently working on some toy reinforcement learning problems, and was wondering if it would be advantageous to use Pyro to implement the REINFORCE algorithm as it is implemented here in PyTorch (utilising torch.distributions):

github.com

pytorch/examples/blob/main/reinforcement_learning/reinforce.py

import argparse
import gym
import numpy as np
from itertools import count
from collections import deque
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.distributions import Categorical


parser = argparse.ArgumentParser(description='PyTorch REINFORCE example')
parser.add_argument('--gamma', type=float, default=0.99, metavar='G',
                    help='discount factor (default: 0.99)')
parser.add_argument('--seed', type=int, default=543, metavar='N',
                    help='random seed (default: 543)')
parser.add_argument('--render', action='store_true',
                    help='render the environment')
parser.add_argument('--log-interval', type=int, default=10, metavar='N',

This file has been truncated. show original

If so, how would you go about doing it? I have been trying to wrap my head around doing it with Pyro’s model and guide architecture for SVI, but I cant come up with a way to do it that makes sense. I hope some of you might help me in the right direction.

Thanks,

MrVamp