MLE estimation of latent variable variances

Hi,

I am considering this very simplistic model:
y = i1 * z1 + i2 * z2 + eps.

z1 and z2 are both sampled from a Normal prior, parameterized by mu1, std1, and mu2, std2.
i1 and i2 are model’s independent indicator variables (either 0 or 1).
We can assume that sigma_eps is known.

I am interested in estimating (either MLE or MAP estimation for that matter) of the parameters mu1/2, and std1/2.

Here’s the model’s code:

def model(covariates, results=None):
    N = covariates.shape[0]
    first_mu = pyro.param('first_mu', lambda: torch.tensor(0.))
    first_std = pyro.param('first_std', lambda: torch.tensor(2.),
                            constraint=constraints.positive)
    
    second_mu = pyro.param('second_mu', lambda: torch.tensor(0.))
    second_std = pyro.param('second_std', lambda: torch.tensor(2.),
                                  constraint=constraints.positive)
    
    first_mu = first_mu.repeat(N)
    second_mu = second_mu.repeat(N)
    
    first_std = first_std.repeat(N)
    second_std = second_std.repeat(N)
        
    with pyro.plate('Zs', N):
        first_z = pyro.sample('first_z', dist.Normal(first_mu, first_std)).view(1, N)
        second_z = pyro.sample('second_z', dist.Normal(second_mu, second_std)).view(1, N)
        
    zs = torch.cat((first_z, second_z), dim=0)
    
    mean = torch.diag(covariates @ zs)
    
    with pyro.plate('obs_plate', N):
        obs = pyro.sample("obs", dist.Normal(mean, 0.1), obs=results) 
        #pyro.sample("obs", dist.Delta(mean), obs=results)

                                                           
    return obs

I am using Trace_ELBO with an “empty” guide, as I am less interested in a bayesian treatment of the latent variables at this point, but rather only on the estimation of the latent variables distributions parameters.

The estimation for the mus (I used simulated data) is decent, however the estimation for the std is really off. It basically seems to converge to 0 (the actual values I used were ~3):

first_mu 2.0213156
first_std 2.7235487e-08
second_mu 3.0581546
second_std 2.1629699e-08
CPU times: user 36.2 s, sys: 1.24 s, total: 37.4 s
Wall time: 35.5 s

Can someone think of a reason why the estimation of the latent variables fails in my case?

Thanks a lot.
Bests,
Eyal.

Hi @EyalItskovit, it looks like your model is a product of N independent models and each of those N independent models has zero EDIT two latent variables, four parameters, and a single observation. I would indeed expect such a product of models to lead to variance collapse. So maybe this is a modeling issue. In your application, do you really have only one observation per instance? Is there any sharing?

Unrelated to inference, it would be great if you could merge the two pyro.plates, since I believe they are a single plate (unless I misunderstand covariates). You should be able to simply omit the second pyro.plate line and indent the zs = and mean = lines. This should be safe as long as covariates does not mix up values across N.

Hi Fritz, thank you for the reply!

Maybe I am getting something wrong. But hopefully you can clarify what you think is wrong with the modeling part. The covariates matrix is of size Nx2. Each row corresponds to [0,0], [0,1], [1,0] or [1,1].

On each observation (out of the N), I take a sample from first_z and second_z that are parameterized by those 4 parameters, and sum them up add them up together (in the case of [1,1]), take one of them (in the case of [1,0] or [0,1]) or ignore them all together (in the case of [0,0]).

Why would you claim that there are 0 latent variables, if each observation is modeled as a summation of z1 and z2 which are unobserved? The four parameters are shared throughout this entire process.

Thanks for the tip in sharing the plate, will implement them.

Bests,
Eyal.

Hi @EyalItskovit, sorry you’re right, there are two latent variables per N value. Let me think about this a bit more :thinking:

Hi @EyalItskovit, my mistake, there are indeed two latent variables per plate instance (I’ve edited the above comment).

I still think the issue is that there are too few observations per plate instance: you still have only one observation per latent variables. The Bayesian hierarchical models work best if there is sharing of a few latent variables among many observations. What is the applied problem you’re trying to solve with this model?