Plate vs automatic broadcast with `obs`

cole_haus · February 11, 2022, 9:21pm

See this notebook for a runnable version: plate-vs-not.ipynb · GitHub

def model1(data: Optional["np.ndarray[float]"] = None) -> None:
    mus = ny.sample("mus", dist.Normal(0, 1).expand((2,)))
    scale_tril = ny.sample("scale_tril", dist.LKJCholesky(2, concentration=1))
    xs = ny.sample(
        "xs", dist.MultivariateNormal(loc=mus, scale_tril=scale_tril), obs=data
    )
    print(xs.shape)

def model2(data: "np.ndarray[float]") -> None:
    mus = ny.sample("mus", dist.Normal(0, 1).expand((2,)))
    scale_tril = ny.sample("scale_tril", dist.LKJCholesky(2, concentration=1))
    with ny.plate("obs", data.shape[0], dim=-1):
        xs = ny.sample(
            "xs", dist.MultivariateNormal(loc=mus, scale_tril=scale_tril), obs=data
        )
        print(xs.shape)

What is the semantic difference between observing (M, N)-shaped data inside an M-sized plate with an N-event-shaped distribution (model 2) and observing (M, N)-shaped data with an N-event-shaped distribution and no plate (model 1)? It seems like in the model 1 case, the sample is automatically getting broadcast to the obs shape somehow? Is it automatically getting plated? As the notebook shows, at least for this scenario, the two approaches produce the same inference results.

fehiepsi · February 11, 2022, 9:29pm

When you do posterior predictive with model1, you will get output shape: num_samples x 2; for model2 it is num_samples x plate_size x 2. The model2 gives you the expected result for posterior predictive.

cole_haus · February 11, 2022, 9:49pm

Thanks for the reply! My understanding then is:

model 1 would be used to represent some scenario where you have num_samples measurements, each of which is bivariate-normally-distributed (e.g. multiple measurements from a pair of correlated thermometers)
model 2 would be used to represent some scenario where you have num_samples measurements, each of which consists of a collection of plate_size bivariate-normally-distributed values (e.g. multiple sets of measurements from a collection of pairs of correlated thermometers)

Does that sound right?