MAP Estimates of Parameters Converge to Mode of Prior Distributions when using a 'Black-Box' Model

Hi there.

I’m a new Pyro user who has been trying to use Pyro to find the MAP estimates of some parameters in a physics-based model. More specifically, I have:

  1. A (deterministic) simulator of the physical system of interest, which takes in some of the parameters I’m interested in estimating as inputs. This simulator is an external function which can be called from within the Pyro model. When it is called, it creates a text file which contains the response of the system for that set of parameter values – the values in this text file can then be loaded back into Python. This simulator is a ‘black-box’ in the sense that gradients cannot be back-propagated through it – it’s simply an external function which Pyro has no access to beyond being able to call it.
  2. A set of noisy observations from this system
  3. A prior for each of the parameters I’m interested in estimating

Although previous forum posts seem to indicate that Pyro is capable of performing parameter inference for these kinds of ‘black-box’ models, I have, unfortunately, been running into some difficulties with this task. In particular, the MAP estimates I’m computing all seem to be converging to the modes of the priors I place on those parameters, causing the ELBO loss to increase – in other words, the data I’m providing is basically being ignored. I suspect that the cause of this issue is related to the fact that the physics simulator saves its predictions to a text file before loading them back into Python.
To illustrate the problem I’m encountering (as well as to show why I suspect the problem is to do with the physics simulator outputting its results as a text file), let’s swap out the physics simulator with a simple linear regression model of the form:

y = b_0 + b_1*x

Suppose we wanted to find the MAP estimates of b_0 and b_1 – we could do this with some artificially generated data by using the following Pyro code:

import pyro 
import torch

def generate_data(x_lb, x_ub, x_step, noise_std, b_0_true, b_1_true):
  x = torch.arange(x_lb, x_ub, x_step)
  b_0 = torch.tensor(b_0_true)
  b_1 = torch.tensor(b_1_true)
  y = b_0 + b_1*x
  with pyro.plate('data_gen', len(y)):
    data = pyro.sample("obs_gen", pyro.distributions.Normal(y, noise_std))
  return (x, data)

# Model from which correct MAP estimates of b_0 and b_1 can be found:
def model_w(data, x, b_0_prior_mode, b_1_prior_mode):
    b_0 = pyro.sample('b_0_w', pyro.distributions.Normal(torch.tensor([b_0_prior_mode]), torch.tensor([1.0])))
    b_1 = pyro.sample('b_1_w', pyro.distributions.Normal(torch.tensor([b_1_prior_mode]), torch.tensor([1.0])))
    y = b_0 + b_1*x
    with pyro.plate('data_w', len(y)):
      data = pyro.sample("obs_w", pyro.distributions.Normal(y, 0.01), obs=data)

if __name__ == "__main__":
  # Specify true b_0 and b_1 values:
  b_0_true = 1.0
  b_1_true = 1.0
  # Std of noise to corrupt our artificially generated data:
  noise_std = 0.01
  # Generate y = b_0 + b_1*x data between x = 0 and x = 3 with step = 0.01:
  x, data = generate_data(0, 3, 0.1, noise_std, b_0_true, b_1_true)

  # Specify the mean (i.e. the mode) of the normal distribution priors on b_0
  # and b_1 (the std of this prior is assumed to be 1.0) - both must be floats:
  b_0_prior_mode = 50.0
  b_1_prior_mode = 3.0

  # Use AutoDelta to compute MAP estimates of b_0 and b_1:
  autoguide_w = pyro.infer.autoguide.AutoDelta(model_w)

  svi = pyro.infer.SVI(model_w,
            autoguide_w,
            pyro.optim.Adam({"lr": .05}),
            loss=pyro.infer.Trace_ELBO())
  
  pyro.clear_param_store()

  # Compute MAP estimates:
  num_iters =  5000
  for i in range(num_iters):
    elbo = svi.step(data, x, b_0_prior_mode, b_1_prior_mode)
    if i%100 == 0:
      print(f"Iteration {i}: Elbo loss = {elbo}")
  print()
  
  # Print MAP estimates along with modes of priors:
  b_0_MAP = pyro.param('AutoDelta.b_0_w').item()
  b_1_MAP = pyro.param('AutoDelta.b_1_w').item()
  print(f"MAP estimate of b_0 = {b_0_MAP}, MAP estimate of b_1 = {b_1_MAP}")
  print(f"Mode of b_0 prior = {b_0_prior_mode}, Mode of b_1 prior = {b_1_prior_mode}")
  print(f"True value of b_0 = {b_0_true}, True value of b_1 = {b_1_true}")

As one might except, this finds the correct MAP estimates for b_0 and b_1:

Iteration 0: Elbo loss = 409240641.9094078
Iteration 100: Elbo loss = 247032695.9439354
Iteration 200: Elbo loss = 145081182.74504852
...
Iteration 4800: Elbo loss = 1124.96164894104
Iteration 4900: Elbo loss = 1115.458746433258

MAP estimate of b_0 = 1.0111202001571655, MAP estimate of b_1 = 0.9927428960800171
Mode of b_0 prior = 50.0, Mode of b_1 prior = 3.0
True value of b_0 = 1.0, True value of b_1 = 1.0

Now suppose that this linear model is, instead, defined inside an external function which saves its output to a text file which we need to read back into Python (i.e. a ‘black-box’ model). In that case, we’d have something like:

import pyro 
import torch
import re

# Generates 
def generate_data(x_lb, x_ub, x_step, noise_std, b_0_true, b_1_true):
  x = torch.arange(x_lb, x_ub, x_step)
  b_0 = torch.tensor(b_0_true)
  b_1 = torch.tensor(b_1_true)
  y = b_0 + b_1*x
  with pyro.plate('data_gen', len(y)):
    data = pyro.sample("obs_gen", pyro.distributions.Normal(y, noise_std))
  return (x, data)

# Model from which correct MAP estimates of b_0 and b_1 can NOT be found:
def model_nw(data, x, b_0_prior_mode, b_1_prior_mode):
    b_0 = pyro.sample('b_0_nw', pyro.distributions.Normal(torch.tensor([b_0_prior_mode]), torch.tensor([1.0])))
    b_1 = pyro.sample('b_1_nw', pyro.distributions.Normal(torch.tensor([b_1_prior_mode]), torch.tensor([1.0])))
    linear_model(x, b_0, b_1)
    y = load_results("out.txt")
    with pyro.plate('data_nw', len(data)):
      data = pyro.sample("obs_nw", pyro.distributions.Normal(y, 0.01), obs=data)

# Computes y = b_0 + b_1*x then writes values to text file called "out.txt":
def linear_model(x, b_0, b_1):
  y = b_0 + b_1*x
  with open('out.txt', 'w') as output:
    for i, item in enumerate(y):
      output.write(str(item.item()))
      output.write("\n")

# Loads values in "out.txt" into a tensor object:
def load_results(results_dir):  
  with open(results_dir, 'r') as input:
    text = input.read()
  pattern = re.compile(r".*\n")
  matches = pattern.finditer(text)
  y = torch.tensor([])
  for i, match in enumerate(matches):
    y = torch.cat((y, torch.tensor([float(match[0])])))
  return y

if __name__ == "__main__":
  # Specify true b_0 and b_1 values:
  b_0_true = 1.0
  b_1_true = 1.0
  # Std of noise to corrupt our artificially generated data:
  noise_std = 0.01
  # Generate y = b_0 + b_1*x data between x = 0 and x = 3 with step = 0.01:
  x, data = generate_data(0, 3, 0.1, noise_std, b_0_true, b_1_true)

  # Specify the mean (i.e. the mode) of the normal distribution priors on b_0
  # and b_1 (the std of this prior is assumed to be 1.0). In this case, the MAP
  # estimates of b_0 and b_1 will tend towards these values. Both values must be 
  # floats:
  b_0_prior_mode = 32.0
  b_1_prior_mode = 3.0

  autoguide_nw = pyro.infer.autoguide.AutoDelta(model_nw)
  
  svi = pyro.infer.SVI(model_nw,
            autoguide_nw,
            pyro.optim.Adam({"lr": .05}),
            loss=pyro.infer.Trace_ELBO())
  
  pyro.clear_param_store()
  
  # Compute MAP estimates:
  num_iters =  5000
  for i in range(num_iters):
    elbo = svi.step(data, x, b_0_prior_mode, b_1_prior_mode)
    if i%100 == 0:
      print(f"Iteration {i}: Elbo loss = {elbo}")
  print()
  
  # Print MAP estimates, modes of priors, and true values:
  b_0_MAP = pyro.param('AutoDelta.b_0_nw').item()
  b_1_MAP = pyro.param('AutoDelta.b_1_nw').item()
  print(f"MAP estimate of b_0 = {b_0_MAP}, MAP estimate of b_1 = {b_1_MAP}")
  print(f"Mode of b_0 prior = {b_0_prior_mode}, Mode of b_1 prior = {b_1_prior_mode}")
  print(f"True value of b_0 = {b_0_true}, True value of b_1 = {b_1_true}")

When this code is executed, however, the MAP estimates of b_0 and b_1 both go towards the modes of their respective priors:

Iteration 0: Elbo loss = 381755714.32246584
Iteration 100: Elbo loss = 404540129.8378781
Iteration 200: Elbo loss = 404521345.83787704
…
Iteration 4700: Elbo loss = 404521377.83787704
Iteration 4800: Elbo loss = 404521377.83787704
Iteration 4900: Elbo loss = 404521377.83787704

MAP estimate of b_0 = 50.0, MAP estimate of b_1 = 3.0
Mode of b_0 prior = 50.0, Mode of b_1 prior = 3.0
True value of b_0 = 1.0, True value of b_1 = 1.0

Based on everything I’ve written here, I have two questions:

  1. Is Pyro capable of performing parameter inference for the kind of ‘black-box’ models I’ve described here? If so, how would one go about this?
  2. Supposing the ‘black-box’ model was able to provide Pyro with access to gradient information (i.e. of the kind required to perform MCMC or SVI), how would I go about providing that information to Pyro?

Apologies for the long post and thanks in advance for everyone’s help.