Hi Pyro,

I am recently working on a model which requires a time-varying dimension observation series. The model can be simplified in the following way:

y_t \sim N(X_t\beta + B_t z_t, \sigma_y^2 I_{d_t}),

z_t \sim N(\phi z_{t-1}, \sigma_z^2),

here y_t has dimension d_t, which is changing along with the time, z_t is a **univariate** hidden state, and it is an AR(1), the X_t is some other independent covariates, \beta is the corresponding regression coefficient. The dimensions here are: (1) y_t \in \mathbb{R}^{d_t}, (2) X_t \in \mathbb{R}^{d_t \times p}, and (3) the observation matrix B_t is \mathbb{R}^{d_t}.

To solve this issue, my idea is to find the maximum dimension for t=0,1,\dots, T, denoted as d_{\max}, and use zero-padding to make all y_t \in \mathbb{R}^{d_{\max}} and corresponding X_t with the same size d_{\max} \times p, the observation matrix B_t \in \mathbb{R}^{d_{\max} \times 1}. Therefore, I modified the gaussian hmm example code using poutine.mask and pyro.plate as follows:

```
def model(Xt, yt, series_lengths):
T = len(yt)
sigma_z = pyro.param('sigma_z', torch.tensor(1.0), constraint=dist.constraints.positive)
sigma_y = pyro.param('sigma_y', torch.tensor(1.0), constraint=dist.constraints.positive)
max_dim = series_lengths.max().item()
d = torch.arange(max_dim).unsqueeze(-1)
init_dist = dist.Normal(0, sigma_z).expand([1]).to_event(1)
beta = pyro.param("beta", torch.ones((5, 1)))
obs_dist = dist.Normal(0, sigma_y).expand([max_dim]).to_event(1)
phi = pyro.param("phi", torch.tensor(0.0))
trans_matrix = phi.reshape((1, 1))
trans_dist = dist.Normal(0, sigma_z).expand([1]).to_event(1)
with pyro.plate("times", T) as time:
obs_matrix = pyro.param("obs_matrix", torch.ones((1, max_dim)))
with pyro.plate("data", max_dim):
with poutine.mask(mask=d < series_lengths):
noise_dist = dist.GaussianHMM(init_dist, trans_matrix, trans_dist, obs_matrix, obs_dist, duration=1)
pyro.sample("obs", noise_dist, obs=yt - (Xt @ beta).flatten(1)[time])
```

In this code, I use `max_dim`

as d_{\max} mentioned above, and for each time step I will sample a new observation.

This code is runable but I think it doesn’t entirely solve my problem. Here are two issues I don’t know how to fix:

(a) the observation matrix B_t now is a fixed size matrix (i.e., d_{\max} \times 1), hence, for different time point, it should have different “zero-padding”.

For example, suppose d_{\max}=7, while at t=10, we have d_t = 3, then B_{10} should be [*, *, *, 0, 0, 0, 0], however, when t=12, we have d_t=4, then B_{12} should be [*, *, *, *, 0, 0, 0], here * is some non-zero numbers. If we use the code above, we will have a big B_t matrix, and it will “estimate” all entries in the matrix using all time points, so I cannot get the estimated \hat{B}_t for each time point. Is there any other method to solve this? I guess the mask could be applied to B_t as well, but I am not sure how to do it.

(b) the second minor issue is I would like to restore and do some analysis on the estimated hidden latent state series z_t, I am not sure if I can extract that from `GaussianHMM`

function.

Many thanks!