Does someone have a code sample on how to model missing values? In theory, I understand that the missing values can be considered just like any other random variables. However, it would be great if I could get a head start on how to write this with Pyro?
P.S.: This is my first time applying theory using a PPL and still trying to calibrate the theory/practice transfer.
There are many ways to model missing data in a PPL like Pyro. I think the main techniques are:
make partial observations sequentially via pyro.sample(..., obs=x) where x is either a tensor or None.
make partial observations in parallelusing poutine.mask to include only observed data in the log prob
optionally model missingness via pyro.sample("present", Bernoulli(p_observed), obs=present)
For example, suppose you have a dataset of inputs x and partially observed outputs y:
def model(x, y, y_present):
assert x.dtype == torch.float
assert x.dtype == torch.float
assert x.shape == y.shape
assert y_present.dtype == torch.uint8
with pyro.plate("data", len(x)):
# Model the data that is observed:
with poutine.mask(y_present):
pyro.module("loc_nn", my_loc_nn)
loc = my_loc_nn(x)
pyro.sample("y", Normal(loc, 1.),
obs=y)
# Model whether data is observed:
pyro.module("presence_nn", my_presence_nn)
p_present = my_presence_nn(x)
pyro.sample("y_present", Bernoulli(p_present),
obs=y_present.float())
I’m unclear how this implementation would look like right inside the model. Do you mind providing rough hints as to what I should be doing using the Pyro PPL?