Modeling missingness indicators

activatedgeek · December 7, 2018, 7:48am

Hey everyone,

Does someone have a code sample on how to model missing values? In theory, I understand that the missing values can be considered just like any other random variables. However, it would be great if I could get a head start on how to write this with Pyro?

P.S.: This is my first time applying theory using a PPL and still trying to calibrate the theory/practice transfer.

fritzo · December 14, 2018, 6:48pm

There are many ways to model missing data in a PPL like Pyro. I think the main techniques are:

make partial observations sequentially via pyro.sample(..., obs=x) where x is either a tensor or None.
make partial observations in parallelusing poutine.mask to include only observed data in the log prob
optionally model missingness via pyro.sample("present", Bernoulli(p_observed), obs=present)

For example, suppose you have a dataset of inputs x and partially observed outputs y:

def model(x, y, y_present):
    assert x.dtype == torch.float
    assert x.dtype == torch.float
    assert x.shape == y.shape
    assert y_present.dtype == torch.uint8
    with pyro.plate("data", len(x)):

        # Model the data that is observed:
        with poutine.mask(y_present):
            pyro.module("loc_nn", my_loc_nn)
            loc = my_loc_nn(x)
            pyro.sample("y", Normal(loc, 1.),
                        obs=y)

        # Model whether data is observed:
        pyro.module("presence_nn", my_presence_nn)
        p_present = my_presence_nn(x)
        pyro.sample("y_present", Bernoulli(p_present),
                    obs=y_present.float())

activatedgeek · December 14, 2018, 7:27pm

That makes sense. I think for a start, I’m choosing to model missingness directly via independent Bernoulli(s).

On this note, one thing that comes up is that sometimes I might want to integrate out my missing values. If I want to put that down in Pyro PPL,

For discrete RVs, does enumeration in the model equal integrating out those missing values?
For continuous RVs, I would like to think of something like a Monte-Carlo EM. Should this be part of the model? If yes, how?

Thank you so much for the inputs!

fritzo · December 14, 2018, 11:52pm

Yes, enumerating in the model is equivalent to integrating out the variables.
I think yes you would sample latent variables in the model and “monte carlo integrate them out”, but I’m not sure.

activatedgeek · December 15, 2018, 3:30am

I’m unclear how this implementation would look like right inside the model. Do you mind providing rough hints as to what I should be doing using the Pyro PPL?