New thread: setting a Mask?

Hi All

Is there any example of setting mask in a model when dealing with missing data? For example, some likelihood terms should be not calculated due to missing y.

J

Hi @junbin.gao, many of the examples and tutorials use masking:

$ grep -Rl mask examples tutorial | grep '\.py\|\.ipynb$'
examples/cvae/baseline.py
examples/cvae/util.py
examples/cvae/cvae.py
examples/cvae/mnist.py
examples/sir_hmc.py
examples/air/air.py
examples/air/main.py
examples/contrib/funsor/hmm.py
examples/mixed_hmm/seal_data.py
examples/mixed_hmm/model.py
examples/scanvi/data.py
examples/dmm.py
examples/hmm.py
examples/capture_recapture/cjs.py
tutorial/source/effect_handlers.ipynb
tutorial/source/boosting_bbvi.ipynb
tutorial/source/dmm.ipynb
tutorial/source/cvae.ipynb
tutorial/source/air.ipynb
tutorial/source/tracking_1d.ipynb
tutorial/build/doctrees/nbsphinx/effect_handlers.ipynb
tutorial/build/doctrees/nbsphinx/boosting_bbvi.ipynb
tutorial/build/doctrees/nbsphinx/dmm.ipynb
tutorial/build/doctrees/nbsphinx/air.ipynb
tutorial/build/doctrees/nbsphinx/tracking_1d.ipynb

Thanks Fritzo. However I did not find an appropriate example. For example the case in CVAE is not something I am after. Here is an example that I want. Suppose y is an observed vector with some missing value, I hope I can pass on this information to sample function, so that the likelihood at that location wont be calculated.

Thanks

J.

You can wrap the sample site with pyro.poutine.mask in which you pass a masking tensor. That is what @fritzo is talking about. That’s all there is to it.

Input:

def mask_model(y, mask,):
    loc = pyro.sample("loc", dist.Normal(0.0, 1.0))
    with pyro.poutine.mask(mask=mask):
        obs = pyro.sample("obs", dist.Normal(loc, 1.0), obs=y)
    return obs


N = 20
y = torch.randn((20,))
mask = torch.randint(high=1 + 1, size=(N,)).type(torch.BoolTensor)

trace = pyro.poutine.trace(mask_model).get_trace(y, mask)
trace.nodes['obs']

Output:

{'type': 'sample',
 'name': 'obs',
 'fn': Normal(loc: -2.9471242427825928, scale: 1.0),
 'is_observed': True,
 'args': (),
 'kwargs': {},
 'value': tensor([-0.7209,  0.3980, -0.2284, -2.1061,  1.5540, -0.5424,  0.4061,  0.1533,
          0.6535, -0.0964, -0.4650, -2.1123, -0.6702,  2.0065,  1.4525,  0.2473,
          0.0139,  0.9315,  0.6688,  0.7431]),
 'infer': {},
 'scale': 1.0,
 'mask': tensor([ True, False,  True, False, False,  True, False, False,  True, False,
         False,  True,  True,  True,  True, False, False, False, False,  True]),
 'cond_indep_stack': (),
 'done': True,
 'stop': False,
 'continuation': None}
4 Likes

Dave, this is really helpful. Thank you so much.

One more further question: In your example, we only have batch shape. If my observed y is a Gaussian vector where some components inside y are missing, does Pyro can calculate logprof for it? Or will a partial logprof be calculated?

J.

I wanted to validate for myself how to properly do masking with NumPyro. I wrote up a toy problem and validated that the inference converged correctly. Check it out! Masked Observations with Numpyro – PainpointsPurelyTechnical