I know there are tutorials/examples on imputing entirely missing observations, but are there any examples or hints as to how to handle missing features? For example, if the data looks like:
np.array(
[[1, .8, 1],
[1, .5, .8],
[.8, .8, 1],
[.2, nan, 0.2],
…])
and the model is something like:
def model(data):
mus = ny.sample("mus", Normal(0, 1).expand((3,)))
chol = ny.sample("chol", LKJCholesky(3, concentration=1))
ny.sample("ys", MultivariateNormal(mus, scale_tril=chol), obs=data)
it would be nice to be able to use the remaining features that are present in the fourth row to improve the estimates of mu for the first and third column and to improve the estimates of their covariances.
(Note: I don’t particularly care about what the missing values themselves are. So I don’t necessarily need to treat the missing values as latent variables and impute them as the examples/tutorials usually do. I just care about improving my estimates of mus
and chol
.)
One approach that seems sort of reasonable to me is to do something like:
ys = ny.sample("ys", MultivariateNormal(mus, scale_tril=chol))
present_idx = np.nonzero(~np.isnan(data))
ny.deterministic("observed_ys", ys.at[present_idx].get(), obs=data.at[present_idx].get())
But deterministic
doesn’t support obs
. Is this actually a reasonable approach? Is there some way to emulate this approach in numpyro
given the absence of obs
support in deterministic
?