Help with multilinear regression experimental design script

Hi there,

I have tried to get started with bayesian optimal experimental design. I have used the working memory example as initially, my setup can be made as a multilinear regression model. I am not super experienced with statistical models / probabilistic programming, so I am hoping one of you can look over the code and tell me if I have made some dumb mistakes. Lastly as you’ll see, i have a parameter constraint problem when I try to estimate marginal_eig.

So far, I have been able to condition a model and it looks like productive results, so at least its not too terrible. I have made a simple 2d synthetic toy dataset as follows:

The practical idea for if/when this works, is to create a statistical experimental design in the design space, then compute an initial multilinear regression model the classical way, which serves as initial coefficients for the probabilistic model. Then do adaptive experiments using EIG from there.

import torch
import pyro
import pyro.distributions as dist
import matplotlib
import matplotlib.pyplot as plt
from torch.distributions.constraints import positive
from pyro.infer import SVI, Trace_ELBO
from pyro.optim import Adam
from pyro.contrib.oed.eig import marginal_eig
import torch.nn.functional as F
import numpy as np
from sklearn.metrics import mean_squared_error
from skimage.filters import gaussian
from sklearn.linear_model import LinearRegression

Initial design of the corners and centerpoint of the above dataset and an MLR model on that yielded some coefficients whichi is my starting point. The model is as follows:

c1_mean = torch.tensor(0.0026)
c1_sd = torch.tensor(c1_mean0.1)
c2_mean = torch.tensor(0.0189)
c2_sd = torch.tensor(c2_mean

def model(l):
with pyro.plate_stack(“plate”, l.shape[:-1]):
coeff1 = pyro.sample(“coeff1”, dist.Normal(c1_mean, c1_sd))
coeff2 = pyro.sample(“coeff2”, dist.Normal(c2_mean, c2_sd))

    coeff1 = coeff1.unsqueeze(-1)
    coeff2 = coeff2.unsqueeze(-1)
    #levels of design
    level1,level2 = get_levels_eig(l)

    #MLR. I added a small bias to force non-zero values. seemed to solve one error
    pred = level1*coeff1+level2*coeff2+0.00001

    y = pyro.sample("y", dist.Normal(pred,pred*0.1).to_event(1))
    return y

The idea is to take in a design (value between 0 and 81). i call them well numbers in the code below because the actual experiments would run in a small plastic plate with wells in them (containing liquid). From this design, retrieve the actual levels of each variable (function for that is in the next code block below), calculate y by multiplying the levels of the variables with the coefficients. Without much knowledge, I suppose the coefficients and y would be normally distributed. As you can see I have initially set the standard deviations as 10% of the mean.

is there a better way to initialize the standard deviation?
Does the model look sane?

def get_levels(well_num):
factor1_levels = [0,20,40,60,80,100,120,140,160]
factor2_levels = [4,6,8,10,12,14,16,18,20]

levels = []
for f1 in factor1_levels:
    for f2 in factor2_levels:

level1 = torch.zeros_like(well_num)
level2 = torch.zeros_like(well_num)

if len(well_num.shape)==1:
    for w,well in enumerate(well_num):
        level1[w] = levels[int(well)][0]
        level2[w] = levels[int(well)][1]
    for s in range(well_num.shape[0]):
        for w in range(well_num.shape[1]):
            well_n = well_num[s,w,0]
return level1,level2

A guide for SVI

def guide(l):
# The guide is initialised at the prior
c1_pmean = pyro.param(“c1_pmean”, c1_mean.clone(), constraint=positive)
c1_psd = pyro.param(“c1_psd”, c1_sd.clone(), constraint=positive)
c2_pmean = pyro.param(“c2_pmean”, c2_mean.clone(), constraint=positive)
c2_psd = pyro.param(“c2_psd”, c2_sd.clone(), constraint=positive)

pyro.sample("coeff1", dist.Normal(c1_pmean, c1_psd))
pyro.sample("coeff2", dist.Normal(c2_pmean, c2_psd)) 

Here again I have initialized the standard deviation for the parameters as 10% of the mean and everything as normally distributed.

Conditioning a model with the initial design (corners and centerpoint) yields a much lower MSE of the MLR model on the full synthetic dataset. so it seems productive.

design = torch.tensor([0, 8, 65, 37, 80])
measured = torch.from_numpy(data_vector[design])

conditioned_model = pyro.condition(model, {“y”: measured})
svi = SVI(conditioned_model,
Adam({“lr”: .001}),


num_iters = 2000
for i in range(num_iters):
elbo = svi.step(design)
if i % 500 == 0:
print(“Neg ELBO:”, elbo)

Defining a marginal guide I was quite unsure how to do that. But taking the example from working memory, I stuck to the definition of the parameter as torch.zeros based on length of design and changed the “y” dist to be the same as within the model. Does this make sense?

def marginal_guide(design, observation_labels, target_labels):
# This shape allows us to learn a different parameter for each candidate design l
pred = pyro.param(“pred”, torch.ones(design.shape[-2:]), constraint=positive)
pyro.sample(“y”, dist.Normal(pred,pred*0.1).to_event(1))

I have estimated the marginal_eig with the code below:

#set coefficient means to initial MLR coefficients and SD to 10%
c1_mean = torch.tensor(0.0026)
c1_sd = torch.tensor(c1_mean0.1)
c2_mean = torch.tensor(0.0189)
c2_sd = torch.tensor(c2_mean

#test all possible options
candidate_designs = torch.arange(0, 81, dtype=torch.float).unsqueeze(-1)

num_steps, start_lr, end_lr = 1000, 0.1, 0.001
optimizer = pyro.optim.ExponentialLR({‘optimizer’: torch.optim.Adam,
‘optim_args’: {‘lr’: start_lr},
‘gamma’: (end_lr / start_lr) ** (1 / num_steps)})

eig = marginal_eig(model,
candidate_designs, # design, or in this case, tensor of possible designs
“y”, # site label of observations, could be a list
[“coeff1”,“coeff2”], # site label of ‘targets’ (latent variables), could also be list
num_samples=100, # number of samples to draw per step in the expectation
num_steps=num_steps, # number of gradient steps
guide=marginal_guide, # guide q(y)
optim=optimizer, # optimizer with learning rate decay
final_num_samples=10000 # at the last step, we draw more samples
# for a more accurate EIG estimate

The EIGs are as follows:

Attached them here in case it indicates a suboptimal part somewhere in my code… I feel like its promising that most of the high EIGs are distributed out in the space, but the initial 9 high values seem abnormal?

before doing anything as fancy as experimental design i suggest you make sure your model first makes sense.

this can’t make sense as written because nothing guarantees that pred is positive and you’re using it as the scale parameter of a Normal distribution:

Thank you for the reply Martin.

I was certainly also unsure about the marginal guide. Looking back and reading further I understand its not correct, but I am still unsure what is the correct solution.

In the case of working memory, the “y” parameter of the marginal estimator is directly computed from the value of the design parameter, so i can see how it can directly be taken as a scale of the “y” distribution. In my case the design is really two numbers (levels) from which y is computed with two latent variables, so its less obvious to me how it should be formulated. I also havent been able to find many examples of marginal guides for inspiration/perspective.

For now i have followed my only intution, which is to formulate it like the model, in that the design parameter is used to fetch levels and y is computed using the coefficients, which form the final parameter of the “y” distribution, as below.

def marginal_guide(design, observation_labels, target_labels):

coeff1 = pyro.sample("coeff1", dist.Normal(c1_mean, c1_sd))
coeff2 = pyro.sample("coeff2", dist.Normal(c2_mean, c2_sd))

coeff1 = coeff1.unsqueeze(-1)
coeff2 = coeff2.unsqueeze(-1)

#levels of designs
level1,level2 = get_levels_eig(design)

#MLR. I added a small bias to force non-zero values. seemed to solve one error
pred = level1*coeff1+level2*coeff2+0.00001

pred = pyro.param("pred", pred,
pyro.sample("y", dist.Normal(pred,pred*0.1).to_event(1))

of course i am just making limitedly educated guesses here, has it has been hard to gauge the right solution from the paper and examples. if this is complete trash, could you point me in the right direction? The above runs and it produces EIGs focused on the area of maximum of “y” in the dataset after conditioning the model on the initial design. that does seem productive, but it still may well be wrong.


i see this in your model code:
y = pyro.sample("y", dist.Normal(pred,pred*0.1).to_event(1))

Of course… I was getting the non-negativity issue occasionally in the marginal guide, so thats why I thought it referred to it, I was just lucky with the model. Thanks for the nudge I will put more energy into the regression model before going further.

Thank you for the help Martin, after some work on understanding bayesian regression basics and Pyro fundamentals I got a working solution.