Handling missing features

I know there are tutorials/examples on imputing entirely missing observations, but are there any examples or hints as to how to handle missing features? For example, if the data looks like:

np.array(
[[1, .8, 1],
[1, .5, .8],
[.8, .8, 1],
[.2, nan, 0.2],
…])

and the model is something like:

def model(data):
mus = ny.sample("mus", Normal(0, 1).expand((3,)))
chol = ny.sample("chol", LKJCholesky(3, concentration=1))
ny.sample("ys", MultivariateNormal(mus, scale_tril=chol), obs=data)

it would be nice to be able to use the remaining features that are present in the fourth row to improve the estimates of mu for the first and third column and to improve the estimates of their covariances.

(Note: I don’t particularly care about what the missing values themselves are. So I don’t necessarily need to treat the missing values as latent variables and impute them as the examples/tutorials usually do. I just care about improving my estimates of mus and chol.)

One approach that seems sort of reasonable to me is to do something like:

ys = ny.sample("ys", MultivariateNormal(mus, scale_tril=chol))
present_idx = np.nonzero(~np.isnan(data))
ny.deterministic("observed_ys", ys.at[present_idx].get(), obs=data.at[present_idx].get())

But deterministic doesn’t support obs. Is this actually a reasonable approach? Is there some way to emulate this approach in numpyro given the absence of obs support in deterministic?

Actually, I think I’ve figured out something that mostly works.

def model3(data: "np.ndarray[float]") -> None:
mus = ny.sample("mus", dist.Normal(0, 1).expand((2,)))
def inner(_, row):
present_idx = jnp.nonzero(jnp.invert(jnp.isnan(row)), size=2)
row_obs = row[present_idx]
mus_obs = mus[present_idx]
scale_tril_obs = jnp.squeeze(jnp.eye(2)[:,present_idx][present_idx,:])
ny.sample("xs", dist.MultivariateNormal(loc=mus_obs, scale_tril=scale_tril_obs), obs=row_obs)
return None, None
scan(inner, None, data))

I know this isn’t quite right, because of the static size for nonzero and the fill_value behavior, but it’s the closest I’ve gotten and seems to generally have the desired behavior (i.e. allows nan in some features while still learning from non-nan features). It seems like where might be a cleaner way to do this?