Am I correct in understanding that using "obs"
as the name of a sample site changes the behaviour of diagnostics like print_summary
?
For example, in the regression example here the model function includes
numpyro.sample("obs", dist.Normal(mu, sigma), obs=divorce)
and the print_summary
call below prints only results for a
, bM
and sigma
, not for every obs
.
I am asking because I have a model that includes two sample sites of observed variables, similar to the form of:
y1 ~ ...
y2 ~ ...
which I can sample with
numpyro.sample("y1", dist.Normal(mu, sigma), obs=y1)
numpyro.sample("y2", dist.Normal(mu, sigma), obs=y2)
While this model samples fine, the output of print_summary
includes diagnostics of every observation of y1
and y2
, whereas it makes sense to rather only include the diagnostics for the latent parameters of the model.
What would be a good solution for cutting down the summary output? Would it make sense to include a feature in a future numpyro release, to allow the default "obs"
keyword to be overwritten with a list?
not really sure what you’re asking. the string args are just strings without any special meaning or function attached to any particular string of characters. they’re just used to name and match. the obs
keyword arg says “this random variable is observed with this value.”
Ah I see where the difference comes from. As you say, it’s not the value of the string for name.
In the model I’m fitting, I’m trying to run prior predictive checks. For this I am setting the outcome variable(s) = None
. This causes each row of the outcome variables to be printed in print_summary
, which is not the behavior I expected.
Example below adapted from here:
def model(marriage=None, age=None, divorce=None):
a = numpyro.sample("a", dist.Normal(0.0, 0.2))
M, A = 0.0, 0.0
if marriage is not None:
bM = numpyro.sample("bM", dist.Normal(0.0, 0.5))
M = bM * marriage
if age is not None:
bA = numpyro.sample("bA", dist.Normal(0.0, 0.5))
A = bA * age
sigma = numpyro.sample("sigma", dist.Exponential(1.0))
mu = a + M + A
numpyro.sample("obs", dist.Normal(mu, sigma), obs=divorce)
# Start from this source of randomness. We will split keys for subsequent operations.
rng_key = random.PRNGKey(0)
rng_key, rng_key_ = random.split(rng_key)
# Run NUTS.
kernel = NUTS(model)
num_samples = 2000
mcmc = MCMC(kernel, num_warmup=1000, num_samples=num_samples)
mcmc.run(
rng_key_, marriage=dset.MarriageScaled.values, divorce=None
)
mcmc.print_summary()
Here, print_summary()
returns 50 values for the obs
variable. If values are given for divorce
instead (as in the example on the page), no summaries are printed for obs
.
I understand that with obs=None
, the "obs"
sample site is just like any other sample site. I suppose my question ends up being: would it make sense to allow an optional argument to print_summary
that would be a list of variables to print (or exclude)?
Thanks in advance!