Hi all, I am trying to replicate the tutorial example SVI Part I: An Introduction to Stochastic Variational Inference in Pyro — Pyro Tutorials 1.8.4 documentation here with my own small test model. I define my model as
c1 = pyro.sample("c1", dist.Gamma(torch.tensor(1.1),
torch.tensor(0.005)))
c2 = pyro.sample("c2", dist.Gamma(torch.tensor(1.1),
torch.tensor(0.005)))
s = pyro.sample("s", dist.Gamma(torch.tensor(1.618),
torch.tensor(2.618)))
theta = pyro.sample("theta",dist.LogNormal(torch.tensor(0.0),s))
p = pyro.sample("p", dist.Beta(torch.tensor(0.1)*(c1-2)+1,
(torch.tensor(1)-torch.tensor(0.1))*(c1-1) + 1))
q = (theta*p)/(1-p+theta*p)
pyro.sample("DNA", dist.Binomial(data[0]+data[1],p), obs=data[0])
for i in np.linspace(2,10,5):
exec("qi_{} = pyro.sample('qi_{}', dist.Beta(q*(c2-2)+1, (1-q)*(c2-1)+1))".format(int(i/2),int(i/2)))
exec( "pyro.sample('RNA_{}', dist.Binomial(data[int(i)] + data[int(i+1)],qi_{}), obs=data[int(i)])".format(int(i/2),int(i/2)))
and I set the surrogate trainable posterior guide()
as
def guide(data):
# register the two variational parameters with Pyro
# - both parameters will have initial value 15.0.
# - because we invoke constraints.positive, the optimizer
# will take gradients on the unconstrained parameters
# (which are related to the constrained parameters by a log)
c1_x = pyro.param("c1_x", torch.tensor(100), constraint=constraints.positive)
c2_x = pyro.param("c2_x",torch.tensor(100),constraint=constraints.positive)
s_x = pyro.param("s_x",torch.tensor(0.5),constraint=constraints.positive)
theta_x = pyro.param("theta_x",torch.tensor(1),constraint=constraints.positive)
p_x = pyro.param("p_x", torch.tensor(0.5),constraint=constraints.positive)
q_x = (theta_x*p_x)/(1-p_x+theta_x*p_x)
for i in np.linspace(2,10,5):
exec("qi_x_{} = pyro.param('qi_x_{}', torch.tensor(0.5),constraint=constraints.positive)".format(int(i/2),int(i/2)))
No matter how I manipulated the simulated data, I always get the same learned variational parameters as the initial value I set for the guide()
. I am sure there is something wrong with guide()
, but just don’t know how to do it correctly. Should I compute the posterior distribution myself and just set a simple distribution to train? Or I need to train on exactly the same distribution including all parameters.
Also I have tried the AutoDiagonalNormal()
which is probably using meanfield VI I guess. Although it works, it runs extremely slow when I add more parameters to the model. It will be much faster to specify the own guide right?
Any advice would be really helpful! Thank you all ahead!