MLE for Normal distribution parameters

nipun_batra · February 10, 2022, 8:20am

Thanks, @martinjankowiak
I was trying to not use SVI for MLE as it seemed like an overkill. I wanted to share a working version for MLE estimate for loc and scale (currently using PyTorch distributions). Complete blog post containing full code.

Here is the minimal code for learning loc given fixed scale. In a previous version of the code, I missed the zero_grad().

learning `loc` given fixed `scale`

import torch
dist = torch.distributions
uv_normal = dist.Normal(loc=0.0, scale=1.0)
train_data = uv_normal.sample([10000])
loc = torch.autograd.Variable(torch.tensor(-10.0), requires_grad=True)
opt = torch.optim.Adam([loc], lr=0.01)
for i in range(3100):
    to_learn = torch.distributions.Normal(loc=loc, scale=1.0)
    loss = -torch.sum(to_learn.log_prob(train_data))
    loss.backward()
    if i % 500 == 0:
        print(f"Iteration: {i}, Loss: {loss.item():0.2f}, Loc: {loc.item():0.2f}")
    opt.step()
    opt.zero_grad()

This gives the learnt loc same as analytical MLE solution (train_data.mean())

learning `loc` and `scale`

The important difference here was that I introduced softplus to ensure positivity of scale.

loc = torch.autograd.Variable(torch.tensor(-10.0), requires_grad=True)
scale = torch.autograd.Variable(torch.tensor(2.0), requires_grad=True)

opt = torch.optim.Adam([loc, scale], lr=0.01)
for i in range(5100):
    scale_softplus = torch.functional.F.softplus(scale)

    to_learn = torch.distributions.Normal(loc=loc, scale=scale_softplus)
    loss = -torch.sum(to_learn.log_prob(train_data))
    loss.backward()
    if i % 500 == 0:
        print(
            f"Iteration: {i}, Loss: {loss.item():0.2f}, Loc: {loc.item():0.2f}, Scale: {scale_softplus.item():0.2f}"
        )
    opt.step()
    opt.zero_grad()

I’m able to get the correct loc and scale.

Would using a uniform prior and no guide and then use SVI be the MLE?
Following up on the previous, will setting our non-uniform prior and no guide and SVI be equivalent to the MAP solution?

If the answer to the above is yes for both, would it be helpful to include sample code such as mine for MLE and MAP instead of running SVI?

MLE for Normal distribution parameters

learning loc given fixed scale

learning loc and scale

learning `loc` given fixed `scale`

learning `loc` and `scale`