How to get started on higher dimensional experimental design

sebbecht · December 21, 2020, 2:44pm

Hi there,
While I have plenty of experience with both Python and PyTorch, many concepts in Pyro is quite new to me. I have read through the adaptive experimental design examples a couple of times and searched quite a bit for related questions or more examples, without answers.

I am specifically looking for ways to design experiments with more than 2 variables. E.g. if I had 3 variables, each with 3 or more variations in levels. Would that be possible using Pyro workflows? The psychology example had 1 variable (length of sequence) and the election example had two (state to poll and number of people to poll). In application, I would carry e.g. seeding experiments and based on the results, design the next experiment which would give the best amount of information to continue or result in the best outcomes.

A nudge in the right direction would be greatly appreciated. Thank you.

martinjankowiak · December 21, 2020, 8:13pm

@sebbecht can you please give some more details about the problem you’re interested in? the algorithms implemented in contrib.oed are quite general and are thus in principle applicable in a wide variety of situations. the degree to which they are applicable in practice will depend on a number of problem specific factors. among the most important factors are the following:

what is the dimensionality of the latent variables in the model? are the latent variables discrete, continuous or both?
what is the dimensionality of the design space? is the design space discrete, continuous or both?

sebbecht · December 22, 2020, 6:21am

Hi @martinjankowiak, thank you for taking your time to dig into this, I appreciate it.

I work in bioprocess engineering and I am interested in improving the experimental design process which includes biological contents in the process and process settings. An example of how I would apply it is:
a discrete design space of 3 variables, each having 4 levels of magnitude, e.g.
Temperature - 25, 30, 35, 40
pH - 3, 4, 5, 6
Biological component 1: 5, 10, 15, 20
Biological component 2: 2, 4, 6, 8

There are many other designs we use of different dimensionalities but this would be an example and it would always be discrete. Some results coming from the processes are continuous, but to keep it simple I would do some feature extraction from curves (e.g. a timeseries profile of something) and extract single discrete variables.

As for the model i dont know I am afraid. From the election example I imagine you are refering to the guide? which would be a 3-layer NN. I realised yesterday that in the election example a 51x list is the input to the system. I guess I could essentially make a list of each combination from the design space and the results from their experiments since a NN doesnt care what the logic behind the numbers are? i.e. a 3 variable with 4 levels would be 64 length list?

The desired outcome as mentioned would be to use historical data and a smaller than usual experimental design/seeding experiment to quickly home unto the combination in the space which would result in the highest outcome, in the end reducing experimental runs.

Edit: We cant run 1 experimental combination at a time which is the workflow i often see in solutions. More often we do 15-30 at a time. Due to the time it takes to do the experiments it would not be viable. But from the examples I’ve seen it looks like this wouldn’t be a problem.

Thanks again, hope it was enough answers.

martinjankowiak · December 22, 2020, 3:28pm

a discrete design space is certainly doable, especially if it’s rather small.

in order to use contrib.oed you need to specify the following three components in the form of a generative model:

the design space (e.g. a set of temperatures)
the latent parameters of interest, i.e. quantities that aren’t directly measured in the experiment. (e.g. a binding affinity)
the observable outcome (e.g. molecular counts)

you’ve mentioned 1 but what about 2 and 3?

sebbecht · December 22, 2020, 4:21pm

Okay thanks. The aim is most often to maximize the growth of microbes, which would be the observable outcomes, i.e. concentration of cells. As for the latent parameters of interest, I am not quite sure what they’d be in my case. I suspect that they are used in an expression to model the outcomes based on the design space? As I understand it, the priors are used to infer the parameters which allows the prediction of the outcomes?

Edit: making guesses but a parameter could be the optimal temperature for the cells enzymes or the cells ability to uptake or metabolize biological components? i.e. biological interactions in the system which the outcome is dependent on?

martinjankowiak · December 22, 2020, 4:57pm

you might find it helpful to read this blog post