Why SVI optimizes guide to mimic prior but not to minimize ELBO

odats · October 16, 2020, 4:22pm

I have a very simple weather model. Dataset is the 250 temperature samples from N(55.,1.) Latent variable is the probability of Cloudy.
After training the Guide distribution is always similar to prior: Beta(10, 20). Anytime I change Prior it will always converge to these values. It does not take into account conditional distribution (observations).
It is clear that if I have 250 obs of temperature 55 it should be Cloudy. I have provided an example of when I change params I get much better ELBO.
Can you please suggest where is a bug?

def weather():
    # Prior
    alpha0 = torch.tensor(10.0)
    beta0 = torch.tensor(20.0)
    prob_cloudy = pyro.sample("prob_cloudy", dist.Beta(alpha0, beta0))

    cloudy = pyro.distributions.Bernoulli(prob_cloudy).sample()
    cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
    mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
    scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]

    with pyro.plate('observe_data'):
      pyro.sample('temp', pyro.distributions.Normal(mean_temp, scale_temp))


def weather_guide():
    alpha0 = pyro.param("alpha0", torch.tensor(10.0), constraint=constraints.positive)
    beta0 = pyro.param("beta0", torch.tensor(1.0), constraint=constraints.positive)
    prob_cloudy = pyro.sample("prob_cloudy", dist.Beta(alpha0, beta0))

pyro.clear_param_store()

# prepare data
obs = pyro.distributions.Normal(55., 1.).sample([250])
conditioned_weather = pyro.condition(weather, data={"temp": obs}) 

adam = Adam({"lr": 0.0005, "betas": (0.90, 0.999)})
svi = SVI(conditioned_weather, weather_guide, adam, loss=Trace_ELBO())

for _ in range(5000):
    svi.step()

It is easy to verify that ELBO is less with different Guide parameters:

pyro.get_param_store()['alpha0'] = torch.tensor(10.)
pyro.get_param_store()['beta0'] = torch.tensor(30.)
loss = []
elbo = Trace_ELBO()
for i in range(1000):
  loss.append(elbo.loss(conditioned_weather, weather_guide_2))
print(np.mean(loss)) # 1041

pyro.get_param_store()['alpha0'] = torch.tensor(10.)
pyro.get_param_store()['beta0'] = torch.tensor(1.)
loss = []
elbo = Trace_ELBO()
for i in range(1000):
  loss.append(elbo.loss(conditioned_weather, weather_guide_2))
print(np.mean(loss)) # 875

martinjankowiak · October 16, 2020, 5:36pm

pyro only “sees” random variables that you register using a pyro.sample statement. here you are using distribution.sample() which is not the same thing. so pyro never registers your bernoulli random variable as being part of the model.

odats · October 16, 2020, 6:58pm

I thought that Pyro will see the connections due to pyro.sample('temp', pyro.distributions.Normal(mean_temp, scale_temp))

I have updated my model and guide and it works now. Did I understand you right?

def weather(obs):
    alpha0 = torch.tensor(10.0)
    beta0 = torch.tensor(20.0)
    prob_cloudy = pyro.sample("prob_cloudy", dist.Beta(alpha0, beta0))

    # fixed
    cloudy = pyro.sample("cloudy", pyro.distributions.Bernoulli(prob_cloudy))

    cloudy = 'cloudy' if cloudy.item() == 1.0 else 'sunny'
    mean_temp = {'cloudy': 55.0, 'sunny': 75.0}[cloudy]
    scale_temp = {'cloudy': 10.0, 'sunny': 15.0}[cloudy]

    with pyro.plate('observe_data'):
      pyro.sample('temp', pyro.distributions.Normal(mean_temp, scale_temp), obs=obs)


def weather_guide(obs):
    alpha0 = pyro.param("alpha0", torch.tensor(10.0), constraint=constraints.positive)
    beta0 = pyro.param("beta0", torch.tensor(1.0), constraint=constraints.positive)
    prob_cloudy = pyro.sample("prob_cloudy", dist.Beta(alpha0, beta0))
    
    # fixed
    cloudy = pyro.sample("cloudy", pyro.distributions.Bernoulli(prob_cloudy))

martinjankowiak · October 16, 2020, 8:11pm

yes that looks right!