 # SVI on hierarchical model

Hi all, I am trying to replicate the tutorial example SVI Part I: An Introduction to Stochastic Variational Inference in Pyro — Pyro Tutorials 1.6.0 documentation here with my own small test model. I define my model as

``````    c1 = pyro.sample("c1", dist.Gamma(torch.tensor(1.1),
torch.tensor(0.005)))
c2 = pyro.sample("c2", dist.Gamma(torch.tensor(1.1),
torch.tensor(0.005)))
s  = pyro.sample("s", dist.Gamma(torch.tensor(1.618),
torch.tensor(2.618)))

theta = pyro.sample("theta",dist.LogNormal(torch.tensor(0.0),s))
p = pyro.sample("p", dist.Beta(torch.tensor(0.1)*(c1-2)+1,
(torch.tensor(1)-torch.tensor(0.1))*(c1-1) + 1))
q = (theta*p)/(1-p+theta*p)

pyro.sample("DNA", dist.Binomial(data+data,p), obs=data)

for i in np.linspace(2,10,5):
exec("qi_{} = pyro.sample('qi_{}', dist.Beta(q*(c2-2)+1, (1-q)*(c2-1)+1))".format(int(i/2),int(i/2)))
exec( "pyro.sample('RNA_{}', dist.Binomial(data[int(i)] + data[int(i+1)],qi_{}), obs=data[int(i)])".format(int(i/2),int(i/2)))
``````

and I set the surrogate trainable posterior `guide()` as

``````def guide(data):
# register the two variational parameters with Pyro
# - both parameters will have initial value 15.0.
# - because we invoke constraints.positive, the optimizer
# will take gradients on the unconstrained parameters
# (which are related to the constrained parameters by a log)
c1_x = pyro.param("c1_x", torch.tensor(100), constraint=constraints.positive)
c2_x = pyro.param("c2_x",torch.tensor(100),constraint=constraints.positive)
s_x  = pyro.param("s_x",torch.tensor(0.5),constraint=constraints.positive)

theta_x = pyro.param("theta_x",torch.tensor(1),constraint=constraints.positive)
p_x = pyro.param("p_x", torch.tensor(0.5),constraint=constraints.positive)
q_x = (theta_x*p_x)/(1-p_x+theta_x*p_x)
for i in np.linspace(2,10,5):
exec("qi_x_{} = pyro.param('qi_x_{}', torch.tensor(0.5),constraint=constraints.positive)".format(int(i/2),int(i/2)))
``````

No matter how I manipulated the simulated data, I always get the same learned variational parameters as the initial value I set for the `guide()`. I am sure there is something wrong with `guide()`, but just don’t know how to do it correctly. Should I compute the posterior distribution myself and just set a simple distribution to train? Or I need to train on exactly the same distribution including all parameters.

Also I have tried the `AutoDiagonalNormal()` which is probably using meanfield VI I guess. Although it works, it runs extremely slow when I add more parameters to the model. It will be much faster to specify the own guide right?

Hi, a guide needs to have one correspondingly named `pyro.sample` site for each `pyro.sample` site in the model - the guide in your snippet has no sample sites at all, so it has no effect on the model and the parameter values are never updated.
That depends on the details of the particular model and guides. `AutoMultivariateNormal` can be a bit slow for large models, but otherwise autoguides are generally comparable in terms of speed to manual guides.