Hey Team
I’m attempting to perform a simple logistic regression using the NUTS sampler following PyMC3’s tutorial for a comparison but am struggling to get similar results.
For ease of simplicity, I’ve attached a notebook to this post with runnable code. I think where I’m going wrong is when I define the model, in PyMC3 it’s defined as:
with pm.Model() as logistic_model:
pm.glm.GLM.from_formula('income ~ age + age2 + educ + hours', data, family=pm.glm.families.Binomial())
trace_logistic_model = pm.sample(2000, chains=1, tune=1000, init='adapt_diag')
When using Pyro, I define it as the following:
def model(income, hours, educ, age2, age):
# cauchy distribution used to constrain intercept (a) to be positive.
a = pyro.sample("a", dist.Cauchy(0., 1000.))
b_h = pyro.sample("b_hours", dist.Normal(0., 1.))
b_e = pyro.sample("b_educ", dist.Normal(0., 1.))
b_a2 = pyro.sample("b_age2", dist.Normal(0., 1.))
b_a = pyro.sample("b_age", dist.Normal(0., 1.))
sigma = pyro.sample("sigma", dist.Uniform(0., 1.))
mean = a + b_h * hours + b_e * educ + b_a2 * age2 + b_a * age
with pyro.iarange("data", len(hours)):
pyro.sample("obs", dist.Bernoulli(logits=mean.squeeze()), obs=income)
I’m unsure as to whether I need to squeeze the mean
or if I need to pass it through a Sigmoid function. Looking at PyTorch’s source, I don’t believe I need to.
Any help and advice would be appreciated! I’ve looked through this post, but couldn’t solve it.
Thanks in advance.