SVI on hierarchical model

sean00002 · June 22, 2021, 8:55am

Hi all, I am trying to replicate the tutorial example SVI Part I: An Introduction to Stochastic Variational Inference in Pyro — Pyro Tutorials 1.8.4 documentation here with my own small test model. I define my model as

    c1 = pyro.sample("c1", dist.Gamma(torch.tensor(1.1), 
                                      torch.tensor(0.005)))
    c2 = pyro.sample("c2", dist.Gamma(torch.tensor(1.1), 
                                      torch.tensor(0.005)))
    s  = pyro.sample("s", dist.Gamma(torch.tensor(1.618),
                                    torch.tensor(2.618)))
    
    theta = pyro.sample("theta",dist.LogNormal(torch.tensor(0.0),s))
    p = pyro.sample("p", dist.Beta(torch.tensor(0.1)*(c1-2)+1,
                                       (torch.tensor(1)-torch.tensor(0.1))*(c1-1) + 1))
    q = (theta*p)/(1-p+theta*p)

    pyro.sample("DNA", dist.Binomial(data[0]+data[1],p), obs=data[0])
    
    for i in np.linspace(2,10,5):
        exec("qi_{} = pyro.sample('qi_{}', dist.Beta(q*(c2-2)+1, (1-q)*(c2-1)+1))".format(int(i/2),int(i/2)))
        exec( "pyro.sample('RNA_{}', dist.Binomial(data[int(i)] + data[int(i+1)],qi_{}), obs=data[int(i)])".format(int(i/2),int(i/2)))

and I set the surrogate trainable posterior guide() as

def guide(data):
    # register the two variational parameters with Pyro
    # - both parameters will have initial value 15.0.
    # - because we invoke constraints.positive, the optimizer
    # will take gradients on the unconstrained parameters
    # (which are related to the constrained parameters by a log)
    c1_x = pyro.param("c1_x", torch.tensor(100), constraint=constraints.positive)
    c2_x = pyro.param("c2_x",torch.tensor(100),constraint=constraints.positive)
    s_x  = pyro.param("s_x",torch.tensor(0.5),constraint=constraints.positive)
    
    theta_x = pyro.param("theta_x",torch.tensor(1),constraint=constraints.positive)
    p_x = pyro.param("p_x", torch.tensor(0.5),constraint=constraints.positive)
    q_x = (theta_x*p_x)/(1-p_x+theta_x*p_x)
    for i in np.linspace(2,10,5):
        exec("qi_x_{} = pyro.param('qi_x_{}', torch.tensor(0.5),constraint=constraints.positive)".format(int(i/2),int(i/2)))

No matter how I manipulated the simulated data, I always get the same learned variational parameters as the initial value I set for the guide(). I am sure there is something wrong with guide(), but just don’t know how to do it correctly. Should I compute the posterior distribution myself and just set a simple distribution to train? Or I need to train on exactly the same distribution including all parameters.

Also I have tried the AutoDiagonalNormal() which is probably using meanfield VI I guess. Although it works, it runs extremely slow when I add more parameters to the model. It will be much faster to specify the own guide right?

Any advice would be really helpful! Thank you all ahead!

eb8680_2 · June 24, 2021, 12:58am

Hi, a guide needs to have one correspondingly named pyro.sample site for each pyro.sample site in the model - the guide in your snippet has no sample sites at all, so it has no effect on the model and the parameter values are never updated.

See this section of part 1 of the SVI tutorial and part 2 of the intro tutorial for more background.

sean00002 · June 24, 2021, 3:02am

I got it! Thank you! Also one more question, will the specified guide go through the SVI faster than the AutoGuide?

eb8680_2 · June 24, 2021, 2:17pm

That depends on the details of the particular model and guides. AutoMultivariateNormal can be a bit slow for large models, but otherwise autoguides are generally comparable in terms of speed to manual guides.