Leveraging MCMC for Active Learning with a Bayesian Neural Network

puchazhong · September 1, 2023, 9:01am

Hello everyone,

I’m currently diving into the world of active learning using MCMC (following Example: Bayesian Neural Network), and I’ve hit a bit of a roadblock. I’m hoping someone with more experience can provide some insights.

Objective:
I’m trying to implement an active learning demo using MCMC. My task is binary classification, and my goal is to use a two-layer neural network (NN) to extract a parameter p (referred to as ["prob"] in my code) from the output Bernoulli distribution.

Model:
Here’s the Bayesian neural network I’ve set up:

# a two-layer bayesian neural network with computational flow
def model(X, Y,  D_H, D_Y=1):
    N, D_X = X.shape

    # sample first layer (we put unit normal priors on all weights)
    w1 = numpyro.sample("w1", dist.Normal(jnp.zeros((D_X, D_H)), jnp.ones((D_X, D_H)) ))
    b1 = numpyro.sample("b1", dist.Normal(jnp.zeros((D_H)), jnp.ones((D_H))))
    assert w1.shape == (D_X, D_H)
    z1 = jnp.matmul(X, w1) + b1 
    z1 = nn.relu(z1)
    assert z1.shape == (N, D_H)

    # sample final layer of weights and neural network output
    w2 = numpyro.sample("w2", dist.Normal(jnp.zeros((D_H, D_Y)), jnp.ones((D_H, D_Y))))
    b2 = numpyro.sample("b2", dist.Normal(jnp.zeros((D_Y)), jnp.ones(( D_Y))))
    assert w2.shape == (D_H, D_Y)
    z2 = jnp.matmul(z1, w2) + b2 # <= output of the neural network
    assert z2.shape == (N, D_Y)

    mean = nn.sigmoid(z2).squeeze(-1)
    prob = numpyro.deterministic("prob", mean)
    
    # observe data
    with numpyro.plate("data", N):       
        numpyro.sample("Y", dist.Bernoulli(mean), obs=Y)

Sampling:
I’m using the NUTS algorithm to sample from the posterior:

# helper function for HMC inference
def run_inference(model, rng_key, X, Y, D_H):
    start = time.time()
    kernel = NUTS(model)
    mcmc = MCMC(
        kernel,
        num_warmup=200,
        num_samples=100,
        num_chains=1,
        progress_bar=False if "NUMPYRO_SPHINXBUILD" in os.environ else True,
    )
    mcmc.run(rng_key, X, Y, D_H)
    # mcmc.print_summary(exclude_deterministic = False)
    print("\nMCMC elapsed time:", time.time() - start)
    return mcmc.get_samples()

Predictive Posterior:
Finally, I’m obtaining the predictive posterior of the parameter p (i.e., ["prob"]) and computing the entropy of the 100 probabilities as a measure of uncertainty. Given that this is an active learning scenario, I’m planning to use this uncertainty to decide which training point to add next:

samples = run_inference(model, rng_key, train_x, train_y,  D_H)
# obtain predictive posterior of the probability of the bernoulli 
predictive_probs = Predictive(model, samples, return_sites=["prob"], num_samples= 100)
probs_posterior = predictive_probs(rng_key_predict, test_x, None, D_H)["prob"]

Concern:
In a typical active learning setup with a standard NN in PyTorch, the model weights are learned progressively. For instance, after learning in round 1, I might add more data in round 2 and continue training based on the model from round 1. However, when I use run_inference(...), it seems like the model starts with unit normal priors on all weights and begins the warmup and sampling phase from scratch.

Question:
Is it theoretically sound to save the posterior of the weights and biases from the previous round and use them as priors for the next round? I feel like this might be a workaround, but I’m not entirely sure if it’s the right approach.

I’m relatively new to this domain and the framework, so any guidance or feedback would be greatly appreciated. If anyone can take a moment to review my approach and let me know if I’m on the right track or if there are glaring issues, I’d be very thankful!

fsosa · March 7, 2024, 5:13pm

Hi @puchazhong, I’m facing a similar question now, except with a non-parametric model. Did you ever figure this out?

Thanks!