`"obs"` reserved sample site name for diagnostics?

Am I correct in understanding that using "obs" as the name of a sample site changes the behaviour of diagnostics like print_summary?

For example, in the regression example here the model function includes

numpyro.sample("obs", dist.Normal(mu, sigma), obs=divorce)

and the print_summary call below prints only results for a, bM and sigma, not for every obs.

I am asking because I have a model that includes two sample sites of observed variables, similar to the form of:

y1 ~ ...

y2 ~ ...

which I can sample with

numpyro.sample("y1", dist.Normal(mu, sigma), obs=y1)

numpyro.sample("y2", dist.Normal(mu, sigma), obs=y2)

While this model samples fine, the output of print_summary includes diagnostics of every observation of y1 and y2, whereas it makes sense to rather only include the diagnostics for the latent parameters of the model.

What would be a good solution for cutting down the summary output? Would it make sense to include a feature in a future numpyro release, to allow the default "obs" keyword to be overwritten with a list?

not really sure what you’re asking. the string args are just strings without any special meaning or function attached to any particular string of characters. they’re just used to name and match. the obs keyword arg says “this random variable is observed with this value.”

Ah I see where the difference comes from. As you say, it’s not the value of the string for name.

In the model I’m fitting, I’m trying to run prior predictive checks. For this I am setting the outcome variable(s) = None. This causes each row of the outcome variables to be printed in print_summary, which is not the behavior I expected.

Example below adapted from here:

def model(marriage=None, age=None, divorce=None):
    a = numpyro.sample("a", dist.Normal(0.0, 0.2))
    M, A = 0.0, 0.0
    if marriage is not None:
        bM = numpyro.sample("bM", dist.Normal(0.0, 0.5))
        M = bM * marriage
    if age is not None:
        bA = numpyro.sample("bA", dist.Normal(0.0, 0.5))
        A = bA * age
    sigma = numpyro.sample("sigma", dist.Exponential(1.0))
    mu = a + M + A
    numpyro.sample("obs", dist.Normal(mu, sigma), obs=divorce)


# Start from this source of randomness. We will split keys for subsequent operations.
rng_key = random.PRNGKey(0)
rng_key, rng_key_ = random.split(rng_key)

# Run NUTS.
kernel = NUTS(model)
num_samples = 2000
mcmc = MCMC(kernel, num_warmup=1000, num_samples=num_samples)
mcmc.run(
    rng_key_, marriage=dset.MarriageScaled.values, divorce=None
)
mcmc.print_summary()

Here, print_summary() returns 50 values for the obs variable. If values are given for divorce instead (as in the example on the page), no summaries are printed for obs.

I understand that with obs=None, the "obs" sample site is just like any other sample site. I suppose my question ends up being: would it make sense to allow an optional argument to print_summary that would be a list of variables to print (or exclude)?

Thanks in advance!