Hi there,

I have tried to get started with bayesian optimal experimental design. I have used the working memory example as initially, my setup can be made as a multilinear regression model. I am not super experienced with statistical models / probabilistic programming, so I am hoping one of you can look over the code and tell me if I have made some dumb mistakes. Lastly as you’ll see, i have a parameter constraint problem when I try to estimate marginal_eig.

So far, I have been able to condition a model and it looks like productive results, so at least its not too terrible. I have made a simple 2d synthetic toy dataset as follows:

The practical idea for if/when this works, is to create a statistical experimental design in the design space, then compute an initial multilinear regression model the classical way, which serves as initial coefficients for the probabilistic model. Then do adaptive experiments using EIG from there.

import torch

import pyro

import pyro.distributions as dist

import matplotlib

import matplotlib.pyplot as plt

from torch.distributions.constraints import positive

from pyro.infer import SVI, Trace_ELBO

from pyro.optim import Adam

from pyro.contrib.oed.eig import marginal_eig

import torch.nn.functional as F

import numpy as np

from sklearn.metrics import mean_squared_error

from skimage.filters import gaussian

from sklearn.linear_model import LinearRegression

Initial design of the corners and centerpoint of the above dataset and an MLR model on that yielded some coefficients whichi is my starting point. The model is as follows:

c1_mean = torch.tensor(0.0026)

c1_sd = torch.tensor(c1_mean0.1)0.1)

c2_mean = torch.tensor(0.0189)

c2_sd = torch.tensor(c2_meandef model(l):

with pyro.plate_stack(“plate”, l.shape[:-1]):

coeff1 = pyro.sample(“coeff1”, dist.Normal(c1_mean, c1_sd))

coeff2 = pyro.sample(“coeff2”, dist.Normal(c2_mean, c2_sd))`coeff1 = coeff1.unsqueeze(-1) coeff2 = coeff2.unsqueeze(-1) #levels of design level1,level2 = get_levels_eig(l) #MLR. I added a small bias to force non-zero values. seemed to solve one error pred = level1*coeff1+level2*coeff2+0.00001 y = pyro.sample("y", dist.Normal(pred,pred*0.1).to_event(1)) return y`

The idea is to take in a design (value between 0 and 81). i call them well numbers in the code below because the actual experiments would run in a small plastic plate with wells in them (containing liquid). From this design, retrieve the actual levels of each variable (function for that is in the next code block below), calculate y by multiplying the levels of the variables with the coefficients. Without much knowledge, I suppose the coefficients and y would be normally distributed. As you can see I have initially set the standard deviations as 10% of the mean.

**is there a better way to initialize the standard deviation?
Does the model look sane?**

def get_levels(well_num):

factor1_levels = [0,20,40,60,80,100,120,140,160]

factor2_levels = [4,6,8,10,12,14,16,18,20]`levels = [] for f1 in factor1_levels: for f2 in factor2_levels: levels.append([f1,f2]) level1 = torch.zeros_like(well_num) level2 = torch.zeros_like(well_num) if len(well_num.shape)==1: for w,well in enumerate(well_num): level1[w] = levels[int(well)][0] level2[w] = levels[int(well)][1] else: for s in range(well_num.shape[0]): for w in range(well_num.shape[1]): well_n = well_num[s,w,0] level1[s,w,0]=levels[int(well_n.item())][0] level2[s,w,0]=levels[int(well_n.item())][1] return level1,level2`

A guide for SVI

def guide(l):

# The guide is initialised at the prior

c1_pmean = pyro.param(“c1_pmean”, c1_mean.clone(), constraint=positive)

c1_psd = pyro.param(“c1_psd”, c1_sd.clone(), constraint=positive)

c2_pmean = pyro.param(“c2_pmean”, c2_mean.clone(), constraint=positive)

c2_psd = pyro.param(“c2_psd”, c2_sd.clone(), constraint=positive)`pyro.sample("coeff1", dist.Normal(c1_pmean, c1_psd)) pyro.sample("coeff2", dist.Normal(c2_pmean, c2_psd))`

Here again I have initialized the standard deviation for the parameters as 10% of the mean and everything as normally distributed.

Conditioning a model with the initial design (corners and centerpoint) yields a much lower MSE of the MLR model on the full synthetic dataset. so it seems productive.

design = torch.tensor([0, 8, 65, 37, 80])

measured = torch.from_numpy(data_vector[design])conditioned_model = pyro.condition(model, {“y”: measured})

svi = SVI(conditioned_model,

guide,

Adam({“lr”: .001}),

loss=Trace_ELBO(),

num_samples=100)pyro.clear_param_store()

num_iters = 2000

for i in range(num_iters):

elbo = svi.step(design)

if i % 500 == 0:

print(“Neg ELBO:”, elbo)

Defining a marginal guide I was quite unsure how to do that. But taking the example from working memory, I stuck to the definition of the parameter as torch.zeros based on length of design and changed the “y” dist to be the same as within the model. Does this make sense?

def marginal_guide(design, observation_labels, target_labels):

# This shape allows us to learn a different parameter for each candidate design l

pred = pyro.param(“pred”, torch.ones(design.shape[-2:]), constraint=positive)

pyro.sample(“y”, dist.Normal(pred,pred*0.1).to_event(1))

I have estimated the marginal_eig with the code below:

#set coefficient means to initial MLR coefficients and SD to 10%

c1_mean = torch.tensor(0.0026)

c1_sd = torch.tensor(c1_mean0.1)0.1)

c2_mean = torch.tensor(0.0189)

c2_sd = torch.tensor(c2_mean#test all possible options

candidate_designs = torch.arange(0, 81, dtype=torch.float).unsqueeze(-1)pyro.clear_param_store()

num_steps, start_lr, end_lr = 1000, 0.1, 0.001

optimizer = pyro.optim.ExponentialLR({‘optimizer’: torch.optim.Adam,

‘optim_args’: {‘lr’: start_lr},

‘gamma’: (end_lr / start_lr) ** (1 / num_steps)})eig = marginal_eig(model,

candidate_designs, # design, or in this case, tensor of possible designs

“y”, # site label of observations, could be a list

[“coeff1”,“coeff2”], # site label of ‘targets’ (latent variables), could also be list

num_samples=100, # number of samples to draw per step in the expectation

num_steps=num_steps, # number of gradient steps

guide=marginal_guide, # guide q(y)

optim=optimizer, # optimizer with learning rate decay

final_num_samples=10000 # at the last step, we draw more samples

# for a more accurate EIG estimate

)

The EIGs are as follows:

Attached them here in case it indicates a suboptimal part somewhere in my code… I feel like its promising that most of the high EIGs are distributed out in the space, but the initial 9 high values seem abnormal?