How to (efficiently) fit a correlation matrix with some elements fixed?

entaylor · July 17, 2024, 11:55pm

I am a long-time emcee-er, and only lately coming to the power and glory of what can be done with the NUTS. I need some help to figure out how to model a multivariate normal distribution with some elements set to zero. Sorry in advance if this ends up being simple: this has been a very steep (but rewarding!!) learning curve, and i’m at the limits of what i can figure out on my own.

I give some more background and context at the end (***), but the essence of it is: i have N observations of M=5 variables, and knowing that the observational errors on all M variables are correlated for each observation. My model for these data is multivariate normal in 3 of the 5 parameters, with the other two parameters being simple scalars, and accounting for the observational errors/uncertainties. In other words, i want to fit a multivariate normal where the covariance matrix for the observational errors uncertainties is plain vanilla; i.e.: C_\mathrm{obs} = \left( \begin{array}{ccccc} \sigma_0^2 & \rho_{01} \sigma_0 \sigma_1 & \rho_{02} \sigma_0 \sigma_2 & \rho_{03} \sigma_0 \sigma_4 & \rho_{04} \sigma_0 \sigma_4 \\ \rho_{01} \sigma_0 \sigma_1 & \sigma_1^2 & \rho_{12} \sigma_1 \sigma_2 & \rho_{13} \sigma_0 \sigma_4 & \rho_{14} \sigma_0 \sigma_4 \\ \rho_{02} \sigma_0 \sigma_2 & \rho_{12} \sigma_1 \sigma_2 & \sigma_2^2 & \rho_{23} \sigma_2 \sigma_3 & \rho_{24} \sigma_2 \sigma_4 \\ \rho_{03} \sigma_0 \sigma_3 & \rho_{13} \sigma_1 \sigma_3 & \rho_{23} \sigma_2 \sigma_3 & \sigma_3^2 & \rho_{34} \sigma_3 \sigma_4 \\ \rho_{04} \sigma_0 \sigma_4 & \rho_{14} \sigma_1 \sigma_4 & \rho_{24} \sigma_2 \sigma_4 & \rho_{34} \sigma_3 \sigma_4 & \sigma_4^2 \\ \end{array} \right)
but the covariance matrix for the model is special:
C_\mathrm{mod} = \left( \begin{array}{ccccc} \sigma_0^2 & \rho_{01} \sigma_0 \sigma_1 & \rho_{02} \sigma_0 \sigma_2 & 0 & 0 \\ \rho_{01} \sigma_0 \sigma_1 & \sigma_1^2 & \rho_{12} \sigma_1 \sigma_2 & 0 & 0 \\ \rho_{02} \sigma_0 \sigma_2 & \rho_{12} \sigma_1 \sigma_2 & \sigma_2^2 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ 0 & 0 & 0 & 0 & 0 \\ \end{array} \right).
The net covariance for the observed data given a particular set of model parameters is the is given by the sum of the two; i.e. C_\mathrm{net} = C_\mathrm{mod} + C_\mathrm{obs}.

Below is a simple model that does most of what i want, which i have based closely on the example use of an LKJ distribution as good priors on a (full/general) covariance matrix. I hope that the process for generating the mock data clearly shows the contours of the problem, and that it is also clear how i am modelling the underlying distribution, and accounting for the covariant uncertainties in the parameter values for each observation or subsample.

Generating mock data:


mu_parent = np.array([ -3., -2., -1., 0., -1. ]) 
sigma_parent = np.array([ 0.15, 0.10, 0.05, 0., 0. ])

# ________________ generate mock observations

num_obs = 90
num_pars = len(mu_parent)

# sample from the parent distribution to get 'true' values
mu_true = mu_parent + sigma_parent*np.random.rand(num_obs,num_pars)
# generate observational errors/uncertainties
sigma_obs = np.exp( -2.5 * np.random.randn(num_obs,num_pars) )
# construct a correlation matrix for observational errors/uncertainties
rho = np.zeros((num_obs, num_pars, num_pars))
for i in range(num_pars):
    for j in range(i+1):
        if i == j : 
            rho[:, i, j] = 1
        else :
            rho[ :, i, j ] = rho[ :, j, i ] = 0.05*np.random.randn(num_obs)
# use these to define covariance matrix for observational errors
cov_obs = rho * sigma_obs[ :, :, np.newaxis] * rho * sigma_obs[ :, np.newaxis, : ]
# NB. cov_obs has shape = ( num_obs, num_pars, num_pars )
# use this covariance to generate observational errors
mu_errors = np.array([ np.random.multivariate_normal(np.zeros(num_pars), C,) for C in cov_obs ])
# and apply these errors to the truth
mu_obs = mu_true + mu_errors

And then modelling these data using ordinary multivariate normal (cribbing closely from the example in the dist.LKJ docstring):

mu_prior = mu_parent
mu_prior_stddev = np.array([ 3., 3., 1., 30., 30. ])

def model(mu_obs, cov_obs):  # y has dimension N x d

    # vector of the parent distribution means for each variable
    mu = numpyro.sample('mu', dist.Normal(mu_prior, mu_prior_stddev))

    # vector of variances for each of the d variables
    sigma = numpyro.sample("sigma", dist.Exponential(jnp.ones(num_pars)))

    # LKJ -> correlation -> covariance matrix for the parent distribution
    concentration = jnp.ones(1)
    # Implies a uniform distribution over correlation matrices
    corr_mat = numpyro.sample("corr_mat", dist.LKJ(num_pars, concentration))
    cov_mat = jnp.outer(sigma, sigma) * corr_mat
    # above is faster formulation of the below 
    #cov_mat = jnp.matmul(jnp.matmul(jnp.diag(sigma), corr_mat), jnp.diag(sigma))

    with numpyro.plate("observations", num_obs):
        obs = numpyro.sample("obs", dist.MultivariateNormal(mu, covariance_matrix=cov_mat + cov_obs), obs=mu_obs)
    return obs

from numpyro.infer import init_to_median
nuts_kernel = NUTS( model, init_strategy=init_to_median )
mcmc = MCMC( nuts_kernel, num_warmup=100, num_samples=100, num_chains=2 )
             
rng_key = random.PRNGKey(0)
mcmc.run( rng_key, mu_obs, cov_obs )

inf_data = az.from_numpyro(mcmc)
print( az.summary(inf_data) )

Compared to the above, what i want to do is to require all of the elements in the 3rd and 4th rows and columns of cov_mat to be zero.

I can imagine a few different strategies to get from where i am to where i want to go; for example:
** I could somehow put a very restrictive additional prior on the values that i want to be zero; e.g. add a normal prior with mu=0 and sigma=1e-6 or something. This seems like a terrible idea.
** I could sample N x 3 latent variables from the covariant normal, plus the 2 ‘global’ parameters, and then condition these on my data based on the observed values and covariance matrices. This is conceptually simple, but i think computationally very expensive, so only a so-so idea, and will probably end up limiting my ability to do what i want to do.
** I could use LKJ to construct a 3x3 covariance matrix, and then somehow reshape it to be 5x5, where the extra elements are all zeros. This seems like the best approach, but i don’t have the jnp skills to work this out.

I can’t guess which might be best/easiest to try first. And i have had no luck finding patterns or examples to help me pursue any of these. Any advice or guidance would be very much appreciated!

*** As possibly-excessive-but-doing-my-best-to-be-concise background/context: this is part of a large hierarchical model, where i am using Expectation Propagation to divide and conquer. I have five parameters to describe the parent distribution from which many samples are drawn. I suspect that three of these parameters are or may be correlated, and so do want to contain the correlations/covariances among these variables. The other two parameters are different: these should be treated as constant and universal. Within an EP framework, my process is to estimate all five parameters for each subsample, to use these estimates (with uncertainties) to constrain the parent distribution, and then to fold the updated information about the parent distribution back into the per-subsample fits in the form of priors. The complication is that i know that the constraints on the five parameters are correlated/covariant. Therefore i need (want) to model the observed distribution of the five parameters using a full covariance matrix, but want to force some elements of the covariance matrix for the true/underlying distribution to be zero.