Observed data is formula of 2 random variables

pyroxymat · May 6, 2020, 4:02pm

Hello,

I would like to know if it is possible to infer incase the observed data is a function of two (or more) random variables?

So e.g. I have the following data:

true_prob = 0.3
true_dist = dist.Bernoulli(true_prob)

observed = dist.Normal(true_dist.sample_n(2000), 1.0).sample()
observed # tensor([ 1.2886, -1.2316, -1.1521,  ..., -0.6966,  1.0491,  2.1959])

Now I want to estimate the true prob from the output. The following works:

def model(data):
    probability = pyro.sample("probability", dist.Uniform(0.0, 1.0))

    with pyro.plate("observed_data"):
        a = pyro.sample("a", dist.Bernoulli(probs=probability))
        pyro.sample(f"obs", dist.Normal(a, 1.0), obs=data)
nuts_kernel = NUTS(model, adapt_step_size=True)
mcmc = MCMC(nuts_kernel, num_samples=4000, warmup_steps=300)
mcmc.run(observed)
mcmc.get_samples()["probability"].mean() # tensor(0.3305)

Now instead of trying to estimate the variables as they are dependent, my observed data is the sum (which is really similar but this is just an example)

def underlying_model():
    probability = pyro.sample("probability", dist.Uniform(0.0, 1.0))

    with pyro.plate("observe_data"):
        a = pyro.sample("a", dist.Bernoulli(probs=probability))
        b = pyro.sample("b", dist.Normal(0.0, 1.0))

        pyro.deterministic("x", a + b)
conditioned_model = poutine.do(underlying_model, data={"x": observed})

nuts_kernel = NUTS(underlying_model_v2, adapt_step_size=True)
mcmc = MCMC(nuts_kernel, num_samples=4000, warmup_steps=300)
mcmc.run()
mcmc.get_samples()["probability"].mean() # tensor(0.4929)

This doesn’t work however. Is there are a way to make this work? In this case I can actually merge them as in the first part of the example but in my case it becomes really hard (impossible maybe) to merge them like that.

martinjankowiak · May 7, 2020, 9:27pm

there is no generic way to do this. the simplest thing to do is replace your deterministic with a gaussian with small variance. this is an approximation, but depending on your particular case it may be sufficient