How to ignore nan values when do hierachical forecast?

twinmegami · December 3, 2020, 7:20am

I find Forecasting III: hierarchical models — Pyro Tutorials 1.8.4 documentation do not cover a problem : nan in real data .
For example :
in 2010, there was only 48 station
in 2011, 3 station was closed and 5 new opened, 50 stations .
in 2012, 2 station was closed in 2011 reponed
…

For another example :
My data is salecount of various products in many stores .
I have reshape the szie to torch.Size([44, 103, 671, 1]) , means:
44 stores, 103 products , 671 days salecount . Some stores may be closed in different days, so as products would be off.shelf by many reason .

random 10 products history salecount :

Prediction on store product level is poor:

Create matrix must have nan values , and

We can’t fill them by 0 because they are different to true 0 .
We should not take nan values into count.
We can’t drop the nan when trainning , because that break timeseries order .

These are real cases .

Trainning is happen at here .

class ForecastingModel(PyroModule, metaclass=_ForecastingModelMeta):
    ....
    def predict( ... 
    ...
        if t_obs == t_cov:  # training
            pyro.sample("residual", noise_dist, obs=data - prediction)
            self._forecast = data.new_zeros(data.shape[:-2] + (0,) + data.shape[-1:])

So, could we add a mask at here , only use not nan values for trainning ?

fritzo · December 3, 2020, 8:46pm

Hi @twinmegami,
I would recommend masking out NAN values using either poutine.mask(), or if you’re not using pyro.plate then the distribution .mask() method. Take care that the actual data values are not NAN but rather some plausible value like zero: Pyro will ignore them if you mask them out, but PyTorch has weird behavior and may produced NAN grads unless those ignored values are finite.

Here’s a rough example

class Model(ForecastingModel):
    def __init__(self, mask):
        self.mask = mask
        super().__init__()
    def model(self, zero_data, covariates):
        ...  # as in https://pyro.ai/examples/forecasting_iii.html
        obs_scale = pyro.sample("obs_scale", dist.LogNormal(-5, 5))
        noise_dist = dist.Normal(0, obs_scale.unsqueeze(-1))
        noise_dist = noise_dist.mask(self.mask)  # <--- you could mask here
        self.predict(noise_dist, prediction)

Feel free to paste part of your actual model code in the dense case, and we can try to help adapt that to a masked version.

twinmegami · December 7, 2020, 7:22am

Sorry, there is some bug in my code , above problems are fixed .

Current problem is

  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/pyro/contrib/forecast/forecaster.py", line 130, in predict
    noise_dist = reshape_batch(noise_dist, noise_dist.batch_shape + (1,))
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/pyro/contrib/forecast/util.py", line 278, in _
    base_dist = reshape_batch(d.base_dist, base_shape)
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/functools.py", line 840, in wrapper
    return dispatch(args[0].__class__)(*args, **kw)
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/pyro/contrib/forecast/util.py", line 272, in reshape_batch
    raise NotImplementedError("reshape_batch() does not suport {}".format(type(d)))
NotImplementedError: reshape_batch() does not suport <class 'pyro.distributions.torch_distribution.MaskedDistribution'>

go to reshape_batch function

@singledispatch
def reshape_batch(d, batch_shape):
    """
    EXPERIMENTAL Given a distribution ``d``, reshape to different batch shape
    of same number of elements.

    This is typically used to move the the rightmost batch dimension "time" to
    an event dimension, while preserving the positions of other batch
    dimensions.

    :param d: A distribution.
    :type d: ~pyro.distributions.Distribution
    :param tuple batch_shape: A new batch shape.
    :returns: A distribution with the same type but given batch shape.
    :rtype: ~pyro.distributions.Distribution
    """
    raise NotImplementedError("reshape_batch() does not suport {}".format(type(d)))

I tried to register as below

@reshape_batch.register(dist.MaskedDistribution)
def _(d, batch_shape):
    base_dist = reshape_batch(d.base_dist, batch_shape)
    return dist.MaskedDistribution(base_dist, d._mask.reshape(base_dist.shape()) )

But got this error when running :

  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/pyro/contrib/forecast/forecaster.py", line 289, in __init__
    elbo._guess_max_plate_nesting(model, guide, (data, covariates), {})
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/pyro/infer/elbo.py", line 109, in _guess_max_plate_nesting
    model_trace.compute_log_prob()
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/pyro/poutine/trace_struct.py", line 221, in compute_log_prob
    .format(name, exc_value, shapes)).with_traceback(traceback) from e
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/pyro/poutine/trace_struct.py", line 216, in compute_log_prob
    log_p = site["fn"].log_prob(site["value"], *site["args"], **site["kwargs"])
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/torch/distributions/independent.py", line 88, in log_prob
    log_prob = self.base_dist.log_prob(value)
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/pyro/distributions/torch_distribution.py", line 303, in log_prob
    return scale_and_mask(self.base_dist.log_prob(value), mask=self._mask)
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/torch/distributions/normal.py", line 72, in log_prob
    self._validate_sample(value)
  File "/home/ufo/anaconda3/envs/dl/lib/python3.7/site-packages/torch/distributions/distribution.py", line 253, in _validate_sample
    raise ValueError('The value argument must be within the support')
ValueError: Error while computing log_prob at site 'residual':
The value argument must be within the support

It is because of the nan in input data .
Do you mention nan due to this ? So I need a not nan input , with a mask , is this correct ?

Then , I fill nan to zero, trainning is successful .

But how to do with Forecaster ? It doesn’t accept mask parameters .

pyro.set_rng_seed(1)
pyro.clear_param_store()
mask = torch.isnan(test_data)
# test_data = torch.Tensor(msc)
test_data = torch.Tensor(np.nan_to_num(msc))

covariates = torch.zeros(test_data.size(-2), 0)
forecaster = Forecaster(Model2(mask=mask[..., T0:T1, :]), test_data[..., T0:T1, :], covariates[T0:T1],
                        learning_rate=0.1, learning_rate_decay=1, num_steps=501, log_every=50)

samples = forecaster(test_data[..., T0:T1, :], covariates[T1:T2], num_samples=100)
samples 

# here tensor([], size=(44, 103, 0, 1))

Actually Forecaster class is strange , __call__ method first argument data seems take no effect in prediction , in tutorials the data length is not equals to covariates length , though it can produce correct result .

fritzo · December 7, 2020, 12:42pm

Hi @twinmegami,
Thanks for clearly reporting this bug. I’ve tried to fix it in the forecast-mask branch. Could you see if that works for you?

Forecaster class is strange , __call__ method first argument data seems take no effect in prediction

I agree the signature is a little unusual. The motivation is that we need to pass a prototype tensor to the .__call__() method during training. This tensor basically bundles metadata (.shape, .dtype, .device). Since the model is generative, it should not look at the actual data it is supposed to be generating --it should only look at the metadata. That’s why we pass in torch.zeros_like(data) rather than the actual data. Note I think you may have accidentally inverted the mask previously (True should mean observed, False should mean missing), so that may have led to all data being ignored.

The reason data can be different length than covariates is exactly for forecasting: we might observe three weeks of data and want to forecast forward one more week. We could add covariates for all four weeks. The difference in length (4 weeks of covariates - 3 weeks of data = 1 week) is exactly the size of the window you’d like to predict.