# Multivariate regression not converging

Dear Pyro Experts,

I am new to Pyro and probabilistic programming in general. I am trying to fit a multivariate regression with 40 covariates and 82 response variables. My simple model is as follows:

``````    def model(x, y=None):
weight = pyro.sample("weight", dist.Normal(torch.zeros(40,82),torch.ones(40,82)).to_event(2))
bias = pyro.sample("bias", dist.Normal(torch.zeros(82,),torch.ones(82,)))
mu = torch.matmul(x,weight) +  bias

with pyro.plate("data", x.shape[0]):
obs = pyro.sample("obs", dist.MultivariateNormal(loc=mu, scale_tril=torch.eye(82)), obs=y)

guide = AutoMultivariateNormal(model)

svi = SVI(model, guide, adam, loss=Trace_ELBO())

for j in range(40000):
loss = svi.step(guideMatrix_trainTensor, expressionMatrix_trainTensor)
``````

However my model does not converge, loss values fluctuate back and forth. When I do a simple regression with the same matrices with one response variable at a time, I get the results with no problem. May I ask what may be going wrong in here? Also is there a way to define priors over the multivariate covarince matrix?

Hi @Basak,

I’d recommend @martinjankowiak’s recent tips tutorial. Additionally here are some specific tips in order of priority:

1. First try an `AutoDelta`, then an `AutoNormal`, then an `AutoMultivariateNormal` guide. These progress from cheap-and-robust to accurate-but-expensive-and-numerically-sensitive.
2. Try decreasing the initial scale parameter of your autoguide:
``````guide = AutoMultivariateNormal(model, init_scale=0.01)  # or even 0.001
``````
3. Make sure all your data inputs and outputs have been scaled to have mean on the order of zero and standard deviation on the order of 1. E.g. 10 ± 0.1 is ok, but 1e6 ± 1e7 will probably fail. This is needed because deep learning optimizers
4. Since your likelihood is diagonal, it will be cheaper to use a `Normal`.
5. I’d recommend also learning the observation noise parameter, say `obs_scale`.
6. I’d recommend a higher learning rate (say 5e-3) and fewer steps (say 1000); if that doesn’t work on a shallow model like this, then something else is probably wrong.

Here are all my suggestions combined:

``````# Standardize data if needed.
x = guideMatrix_trainTensor
y = expressionMatrix_trainTensor
x_std = (x - x.mean(0)) / x.std(0)
y_std = (y - y.mean(0)) / y.std(0)

def model(x, y=None):
weight = pyro.sample("weight", dist.Normal(torch.zeros(40,82), 1).to_event(2))
bias = pyro.sample("bias", dist.Normal(torch.zeros(82), 1).to_event(1))
obs_scale = pyro.sample("obs_scale", dist.LogNormal(0, 1))
mu = x @ weight + bias
with pyro.plate("data", x.shape[0]):
pyro.sample("obs", dist.Normal(mu, obs_scale).to_event(1), obs=y)

# Try a few different guide types.
guide1 = AutoDelta(model)
guide2 = AutoNormal(model, init_scale=0.01)
guide3 = AutoMultivariateNormal(model, init_scale=0.01)
guide = guide1  # Change to guide2 and guide3 once you get this working.

for step in range(1000):  # Increase as needed.
loss = svi.step(x, y)
if step % 100 == 0:
print(f"step {step} loss = {loss}")
``````

Thank you so much @fritzo, I appreciate your help very much. My inputs and outputs are already given as z-score values, but I will try your other suggestions. One last question: what are the implemented distributions that can be used for the covariance matrices? I saw an example usage of LKJ in one of the previous posts as :

`````````
def model(x, y=None):
β = pyro.sample("β", dist.Normal(torch.zeros(x.shape[1],2),torch.ones(x.shape[1],2)).to_event(2))
B0 = pyro.sample("B0", dist.Normal(torch.zeros(2,),torch.ones(2,)))
μ = torch.matmul(x,β) +  B0
θ = pyro.sample("θ", dist.HalfNormal(torch.ones(2)))
η = pyro.sample("η", dist.HalfNormal(1.))
L = pyro.sample("L", dist.LKJCorrCholesky(2, η))
L_Ω = torch.mm(torch.diag(θ.sqrt()), L)
with pyro.plate("data", x.shape[0]):
obs = pyro.sample("obs", dist.MultivariateNormal(loc=μ, scale_tril=L_Ω), obs=y)
```
``````

Is there another way to define the prior for the covariance matrix? I am asking this because my outputs actually have a covariance structure and I would like the model to use that information. How can I incorporate that to the model?

Hi @Basak,

Hmm I’m not sure I understand. If you have a known covariance `cov` matrix for the conditional distribution `y|x` then you could use that in the likelihood

``````pyro.sample("obs", dist.MultivariateNormal(mu, cov), obs=y)
``````

Note it is cheaper to precompute the cholesky sqrt outside of inference

``````scale_tril = torch.cholesky(cov)
...
def model(...):
...
pyro.sample("obs", dist.MultivariateNormal(mu, scale_tril=scale_tril), obs=y)
``````