How to convert A/B testing model in pymc3 to pyro?

divyanshu · July 6, 2018, 9:38am

Hey,
I am just starting out in probabilistic programming and am going through GitHub - CamDavidsonPilon/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers: aka "Bayesian Methods for Hackers": An introduction to Bayesian methods + probabilistic programming with a computation/understanding-first, mathematics-second point of view. All in pure Python ;), in the second chapter, the author describes how to do A/B testing using MCMC using pymc3:

def analyse(As, Bs):
    totalA = 0.55
    totalB = 0.45
    with pm.Model() as model:
        p_A = pm.Beta('p_A', totalA, 1 - totalA)
        p_B = pm.Beta('p_B', totalB, 1 - totalB)
        delta = pm.Deterministic("delta", p_A - p_B)
        ADist = pm.Bernoulli("obsA", p_A, observed=As)
        BDist = pm.Bernoulli("obsB", p_B, observed=Bs)
        step = pm.Metropolis()
        trace = pm.sample(20000, step=step, cores=1)
        burned_trace = trace[1000:]
    delta_samples = burned_trace["delta"]
    return delta_samples, burned_trace["p_A"], burned_trace["p_B"]

so I call analyse and send it two lists [1,0,0,1,1] and [0,0,0,1], which basically means 5 people came to A, 3 converted, and 4 people came to B and 1 converted. I am wondering how to convert this model to pyro, any help will be appreciated, thanks

fehiepsi · July 10, 2018, 3:41am

I think that you can start to build some graphical model in Pyro using tutorials in Getting Started With Pyro: Tutorials, How-to Guides and Examples — Pyro Tutorials 1.8.4 documentation or pyro/examples/eight_schools at dev · pyro-ppl/pyro · GitHub example to be familiar with Pyro.

Then I believe you can build that pymc3 model. pymc3 uses MCMC to train their model. You can find a similar solution in the example pyro/baseball.py at dev · pyro-ppl/pyro · GitHub.

abie · January 2, 2022, 12:09am

I am building my experience with NumPyro, and I thought I’d give this a shot. Here is the way I did it, and I would be very interested in any feedback from @fehiepsi or other experts about what could make this more a idiomatic numpyro approach:

# NumPyro version of the model
def model(As, Bs, totalA, totalB):
    p_A = numpyro.sample('p_A', dist.Beta(totalA, 1-totalA))
    p_B = numpyro.sample('p_B', dist.Beta(totalB, 1-totalB))
    delta = numpyro.deterministic('delta', p_A - p_B)
    ADist = numpyro.sample('obsA', dist.BernoulliProbs(p_A), obs=As)
    BDist = numpyro.sample('obsB', dist.BernoulliProbs(p_B), obs=Bs)

# NumPyro version of the inference
rng_key = random.PRNGKey(12345)

kernel = NUTS(model)
mcmc = MCMC(kernel, num_warmup=1_000, num_samples=2_000, thinning=1)
mcmc.run(
    rng_key,
    As=jnp.array([1,0,0,1,1]),
    Bs=jnp.array([0,0,0,1]),
    totalA=0.55,
    totalB=0.45
)
posterior_samples = mcmc.get_samples()
mcmc.print_summary()

Here is a notebook that puts this in context: 2022_01_01a_numpyro_ab_testing_model.ipynb · GitHub

fehiepsi · January 2, 2022, 2:31am

The modeling code looks great! It might be a bit clearer if we put obs... variables into a numpyro.plate context.

abie · January 2, 2022, 5:44am

Ooh, right, thank you for explaining about plates in this ticket. Here is my attempt with plates

# NumPyro version of the model with plates
def model(As, Bs, totalA, totalB):
    p_A = numpyro.sample('p_A', dist.Beta(totalA, 1-totalA))
    p_B = numpyro.sample('p_B', dist.Beta(totalB, 1-totalB))
    delta = numpyro.deterministic('delta', p_A - p_B)
    
    with numpyro.plate("num_A", len(As)):
        numpyro.sample("obsA", dist.BernoulliProbs(p_A), obs=As)

    with numpyro.plate("num_B", len(Bs)):
        numpyro.sample("obsB", dist.BernoulliProbs(p_B), obs=Bs)

I am definitely still a little confused about this affordance. Why do I have to tell the first plate that the size is len(As)? Isn’t that implicit when I say obs=As on the next line?

fehiepsi · January 2, 2022, 3:55pm

When data is available and dimension is specified (which is the right most batch dimension by default), we can interpret the size but when data is not available, we need to know the size of that plate. This is useful for e.g. getting prior/posterior predictive distribution or triggering shape errors if things are mismatched,…