Thanks, @martinjankowiak
I was trying to not use SVI for MLE as it seemed like an overkill. I wanted to share a working version for MLE estimate for loc
and scale
(currently using PyTorch distributions). Complete blog post containing full code.
Here is the minimal code for learning loc
given fixed scale
. In a previous version of the code, I missed the zero_grad()
.
learning loc
given fixed scale
import torch
dist = torch.distributions
uv_normal = dist.Normal(loc=0.0, scale=1.0)
train_data = uv_normal.sample([10000])
loc = torch.autograd.Variable(torch.tensor(-10.0), requires_grad=True)
opt = torch.optim.Adam([loc], lr=0.01)
for i in range(3100):
to_learn = torch.distributions.Normal(loc=loc, scale=1.0)
loss = -torch.sum(to_learn.log_prob(train_data))
loss.backward()
if i % 500 == 0:
print(f"Iteration: {i}, Loss: {loss.item():0.2f}, Loc: {loc.item():0.2f}")
opt.step()
opt.zero_grad()
This gives the learnt loc
same as analytical MLE solution (train_data.mean()
)
learning loc
and scale
The important difference here was that I introduced softplus
to ensure positivity of scale.
loc = torch.autograd.Variable(torch.tensor(-10.0), requires_grad=True)
scale = torch.autograd.Variable(torch.tensor(2.0), requires_grad=True)
opt = torch.optim.Adam([loc, scale], lr=0.01)
for i in range(5100):
scale_softplus = torch.functional.F.softplus(scale)
to_learn = torch.distributions.Normal(loc=loc, scale=scale_softplus)
loss = -torch.sum(to_learn.log_prob(train_data))
loss.backward()
if i % 500 == 0:
print(
f"Iteration: {i}, Loss: {loss.item():0.2f}, Loc: {loc.item():0.2f}, Scale: {scale_softplus.item():0.2f}"
)
opt.step()
opt.zero_grad()
I’m able to get the correct loc
and scale
.
- Would using a
uniform prior
andno guide
and then use SVI be the MLE? - Following up on the previous, will setting our
non-uniform prior
andno guide
and SVI be equivalent to the MAP solution?
If the answer to the above is yes for both, would it be helpful to include sample code such as mine for MLE and MAP instead of running SVI?