Any example for hierarchical multivariate time series forecasting?

I am trying to learning pyro to build a hierarchical time series forecasting model .

My target: predict product_in_different_store for future 14 days salecount .

For example ,

  • I have 4 levels hierarchy structure category 1/ category 2 / product_in_different_city/ product_in_different_store .
  • The exogenous features are bound to bottom level product_in_different_store and middle level product_in_different_city
  • Many product_in_different_store may have different distribution : normal , poisson , negative-binomial , zero-inflated-poisson or zero-inflated-negative-binomial

My motivation:

  1. Train ARIMA/Prophet on each product_in_different_store or product_in_different_city with one-hot encode store features doesn’t perform well , because each single product timeseries is lack of exogenous features , some features may be only happend once(but may apreared many times in different product) , hard to estimate best coefficient . So I am looking into hierarchy model .
  2. I can’t look so many product ( 4 cities , avg 30+ stores each city, about 8000+ product each store ) , seems many product only on selling in very short time ( less than 28 days , have blank period or stock out period ) . It is hard to choose one distribution for all of them .

My question:

  1. Could we choose distribution dynamicly in trainning ?
  2. Could you provide an example for hierarchical multivariate time series ? I have saw some hierarchy model only bind features to top level , how do I bound different feature to different levels ?

please visit here and look under the “APPLICATION: TIME SERIES” heading on the left hand side (e.g. http://pyro.ai/examples/forecasting_i.html)

I spent some days in timeseries forecast of deeplearning with tensorflow , but I still didn’t quite understand pyro .
The example doesn’t answer my question , because it:

  1. have to set distribution manually , I don’t know when to set what distribution , so I wonder if it can learn which distribution is better . My data have many zero-inflated-poisson and zero-inflated-negative-binomial distribution , which is hard to determine the distribution params .
  2. doesn’t use exogenous features