Imposing constraint on covariance matrix

llaurabat · February 22, 2022, 12:45pm

I am interested in estimating the covariance matrix of a regression model with a specific covariance prior as outlined in the glasso model below. However, when I try to estimate the model with NUTS and number of features p above 3, I get ValueError: MultivariateNormal distribution got invalid covariance_matrix parameter. I believe this is because the correlation matrix does not fulfill the requisites of 1)being non-negative define, 2) having off-diagonal elements between -1 and 1. I would like to know how to go about imposing such constraints.

This is the model:

def glasso(beta0_m=0., beta0_s=1., mu_m=0., mu_s=1., Y=None, n=None, p=None):
    try:
        n, p = jnp.shape(Y)
    except:
         assert ((n is None)|(p is None)) is False
    
    with plate("features", p):
        mu = sample("mu", dist.Normal(mu_m, mu_s))
        sqrt_diag = sample("sqrt_diag", dist.InverseGamma(1., 0.5))

    beta0 = sample("beta0", dist.Normal(beta0_m, beta0_s))
    off = int((p*p-p)/2)
    with plate("off_diag_corr", off):
        rho_off = sample("rho_off", dist.Laplace(0, 1/(jnp.exp(beta0))))
    
    rho = jnp.zeros((p,p))
    tril_idx = jnp.tril_indices(n=p, k=-1, m=p)
    rho = rho.at[tril_idx].set(rho_off)
    rho = rho + rho.T + jnp.diag(jnp.ones(p))
          
    theta = jnp.outer(sqrt_diag,sqrt_diag)*rho
    theta = deterministic("theta", theta)

    with plate("hist", n):
        Y = sample("obs", dist.MultivariateNormal(mu,theta), obs = Y)
        
    return {'Y':Y, 'theta':theta, 'mu':mu, 'beta0':beta0}

This is to simulate the data and run the hmc algo:

# simulation params
beta0_true=1
p = 5
mu_true = jnp.ones(p)
n_obs = 100

# estimation params
n_warmup = 10
n_samples = 20

glasso_sub = substitute(glasso, {"mu": mu_true, "beta0":beta0_true})
sim_res = seed(glasso_sub, Key(2*n_obs))(n=n_obs, p=p)

glasso_run = block(condition(glasso, {"beta0":beta0_true}), hide=["beta0"])
nuts_kernel = NUTS(glasso_run)

mcmc = MCMC(nuts_kernel, num_warmup=n_warmup, num_samples=n_samples)
mcmc.run(rng_key = Key(3), Y=sim_res['Y'])

Even though I need to shape the prior as in glasso, I have experimented as well with a LKJ prior on the correlation matrix (glasso_lkj, model below), but I do get the same error with number of features p above 18.

def glasso_lkj(beta0_m=0., beta0_s=1., mu_m=0., mu_s=1., Y=None, n=None, p=None):
    try:
        n, p = jnp.shape(Y)
    except:
         assert ((n is None)|(p is None)) is False
    
    with plate("features", p):
        mu = sample("mu", dist.Normal(mu_m, mu_s))
        sqrt_diag = sample("sqrt_diag", dist.InverseGamma(1., 0.5))

    rho = sample("rho", dist.LKJ(dimension=p, concentration=1))  
    beta0 = sample("beta0", dist.Normal(beta0_m, beta0_s))

    theta = jnp.outer(sqrt_diag,sqrt_diag)*rho
    theta = deterministic("theta", theta)

    with plate("hist", n):
        Y = sample("obs", dist.MultivariateNormal(mu,theta), obs = Y)
        
    return {'Y':Y, 'theta':theta, 'mu':mu, 'beta0':beta0}

Below the error I get when I try to estimate the model:

fehiepsi · February 22, 2022, 2:19pm

I think

theta = jnp.outer(sqrt_diag,sqrt_diag)*rho

is not a symmetric positive definite matrix, so it cannot be a covariance matrix of MVN distribution. Shouldn’t it be diag @ rho @ diag?

llaurabat · February 22, 2022, 3:05pm

I think theta = jnp.outer(sqrt_diag,sqrt_diag)*rho is equivalent to jnp.diag(sqrt_diag)@rho@jnp.diag(sqrt_diag), but anyway none of them seem to work. I have tried as well the example at this link (picture below for reference) but for e.g., d=20 it throws the same error.

fehiepsi · February 22, 2022, 3:24pm

You are right, they are equivalent. I’m not sure what causes the issue. If Rho is just a 4x4 matrix, could you print it out to double check? I guess we have a bug somewhere.

llaurabat · February 23, 2022, 2:09pm

For the glasso_lkj model with LKJ prior on the correlation matrix, I see that I solve the issue when I manually add the constraint _CorrMatrix (see below). This works for the case where I directly sample a correlation matrix.

In fact, I am interested in cases where I define a prior for each element of a matrix and then build the matrix under the constraint that this matrix is a correlation matrix.

For example consider Gaussian graphical model estimation with the Graphical LASSO (GLASSO) (for a reference, see Bayesian Graphical Lasso Models and Efficient Posterior Computation). Here I place Laplace priors on all the off-diagonal elements of a Covariance matrix. Of course sampling these independently doesn’t guarantee the resulting matrix will be positive definite, and so the joint prior needs a further constraint that the matrix is positive definite.

Note the method I consider below extends this to a correlation matrix (see https://arxiv.org/pdf/2104.10099.pdf)

Is there functionality to do this currently available in Numpyro? Allowing for this facilitates Graphical Modeling and could really enhance NumPyro. Thanks a lot!

def glasso_lkj(beta0_m=0., beta0_s=1., mu_m=0., mu_s=1., Y=None, n=None, p=None):
    try:
        n, p = jnp.shape(Y)
    except:
         assert ((n is None)|(p is None)) is False
    
    with plate("features", p):
        mu = sample("mu", dist.Normal(mu_m, mu_s))
        sqrt_diag = sample("sqrt_diag", dist.InverseGamma(1., 0.5))

    rho_init = jnp.zeros((p,p))
    diag_idx = jnp.diag_indices(n=p)
    rho_init = rho_init.at[diag_idx].set(1)
    
    rho = param('rho', rho_init, constraint=dist.constraints._CorrMatrix)
    sample('rho_obs', dist.LKJ(dimension=p, concentration=1), obs=rho)
    beta0 = sample("beta0", dist.Normal(beta0_m, beta0_s))

    theta = jnp.outer(sqrt_diag,sqrt_diag)*rho
    theta = deterministic("theta", theta)

    with plate("hist", n):
        Y = sample("obs", dist.MultivariateNormal(mu,theta), obs = Y)
        
    return {'Y':Y, 'theta':theta, 'mu':mu, 'beta0':beta0, 'sqrt_diag':sqrt_diag, 'rho':rho}

fehiepsi · February 23, 2022, 4:51pm

Currently, LJK.support = constraints.corr_matrix, so I’m not sure why you got the issue. Just to summarize the error in different situations

Using LKJ prior raises errors (because its sample might not belong to its support - which is likely a bug in one of our transforms)
Using param with corr_matrix constraint bypass the issue (but param is just constant in MCMC, so it does not actually fix I guess)
And you want to ask if you can set Laplace priors to entries of correlation matrices (which you can use ImproperDistribution(constraints.corr_matrix, (3, 3), ()) together with a Laplace factor)

rho = sample("rho", Improper...)
numpyro.factor("rho_factor", dist.Laplace(...).log_prob(rho).sum(...))

Do I understand correctly?

llaurabat · February 25, 2022, 12:02pm

Thanks a lot. This appears in principle to give me the functionality I want.

I have tried to get a minimal example working along the lines you explained (see below), but already for p\geq20 I see that the smallest eigenvalue becomes negative (although very close to zero) and I hit the same ValueError: MultivariateNormal distribution got invalid covariance_matrix parameter. This happens when considering both the covariance_matrix and precision_matrix (commented out) parametrisations, with or without placing a Laplace prior within factor (the commented out line), and both with a corr_matrix constraint and with a positive_definite constraint.

Might it be that there is a different tolerance threshold for 1) fulfilling the correlation matrix constraint and 2) being accepted as a covariance matrix for a multivariate Normal?
Do you have any suggestions for how to solve this?

# params
p = 20
n = 100

# data simulation
rho_true = jnp.zeros((p,p)) + jnp.diag(jnp.ones((p,)))
#Y = dist.MultivariateNormal(jnp.zeros(p), precision_matrix=rho_true).expand((n,)).sample(Key(8))
Y = dist.MultivariateNormal(jnp.zeros(p), covariance_matrix=rho_true).expand((n,)).sample(Key(8))

def model(Y):
    n, p = Y.shape
    mu = jnp.zeros(p)


    rho = sample('rho', ImproperUniform(constraints.corr_matrix, (), event_shape=(p,p))) #same with constraints.positive_definite 
    #numpyro.factor("rho_factor", dist.Laplace(0, 1/20).log_prob(rho).sum())
    print('smallest eigenvalue of rho: ', jnp.sort(jnp.linalg.eigh(rho)[0])[0])

    with plate("hist", n):
        #Y = sample("obs", dist.MultivariateNormal(mu, precision_matrix=rho), obs = Y)
        Y = sample("obs", dist.MultivariateNormal(mu, covariance_matrix=rho), obs = Y)

# run model
nuts_kernel = NUTS(model)

mcmc = MCMC(nuts_kernel, num_warmup=10, num_samples=10)
mcmc.run(rng_key = Key(7), Y=Y)

martinjankowiak · February 25, 2022, 1:17pm

are you using 64 bit precision? Runtime Utilities — NumPyro documentation

llaurabat · February 25, 2022, 3:19pm

I wasn’t, thanks. However, I have now set enable_x64(use_x64=True) but I still hit the same problem for p>30 - while for my application I would be looking for p=100 at a minimum.

martinjankowiak · February 25, 2022, 5:02pm

well i think you’re courting numerical problems by using an improper prior like that. from a practical point of view the point of a prior isn’t only to inject prior information but also to regularize things. in this case some amount of regularization may be important to avoid numerical problems. so if i were you i’d reconsider my prior assumptions.

fehiepsi · February 25, 2022, 5:44pm

How about using

from numpyro.infer import init_to_feasible
nuts_kernel = NUTS(model, init_strategy=init_to_feasible)

I guess our transform between domain and codomain of corr_matrix is not stable enough (for the default init_to_uniform strategy). Please make a Github issue for this.

llaurabat · March 24, 2022, 5:18pm

Thanks so much for the help, it was great. Changing init strategy to init_to_feasible solved it for me. I have raised as well an issue on Github.