# MCMC for biology

Dear all,

I am very much a beginner in probabilistic programming. My problem comes from genetics and I hope it might have a nice solution in pyro, as it can not, to my knowledge, be solved analytically (for larger data).

A formulation:
I have a sequence of n genetic sites, each site having the value one. This sequence now undergoes multiple changes. At every iteration, each sequence site can change by adding or subtracting a 1 or leaving the site unchanged. However, changes are not independent between sites, but correlate with changes at neighboring sites (which I model with a Markov Chain). Now, given an input sequence and an output sequence, I would like to know, how many iterations have happened.

Here is my attempt:

``````import torch as tc
import warnings; warnings.simplefilter("ignore", FutureWarning)
import pyro
import pyro.distributions as dist
from pyro.infer.mcmc import MCMC, NUTS

pyro.set_rng_seed(1)
``````

I first build a function from which I can sample a markov chain. The output is the change of my sequence in each iteration:

First defining the transition probabilities

``````transition_prob = tc.tensor([[0.7,0.3,0.0],[0.025,0.95,0.025],[0.0,0.3,0.7]])
``````

Defining the MC and sampling an example of length 20:

``````def mc(transition_prob, length):
final_markov_chain = []
for t in range(length+1):
if t > 0:
category = pyro.sample("category_{}".format(t), dist.Categorical(transition_prob[category]))
final_markov_chain.append(category)

return ((tc.stack(final_markov_chain))-1).float()

mc(transition_prob, 20)
Out:
tensor([ 0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,  0.,
0.,  0.,  0.,  0., -1., -1.])

``````

So the last two sites would be changed with this mc output and at this iteration. Now i model a system that receives an input and conducts these changes ‘max_iter’ times. max_iter is sampled from a Poisson distribution

``````def model_system(inp):
poisson_prior = tc.tensor(4.0) #pyro.sample("poisson_prior", dist.HalfNormal(6.0,3.0))

max_iter = pyro.sample("poisson", dist.Poisson(poisson_prior))

iteration = 0
output_vector = inp.clone()
while iteration <= max_iter:
iteration +=1
change = mc(transition_prob, length = inp.shape[0])

#as an additional tweak, sites that have the value zero are stuck at zero
output_vector = tc.where(output_vector == 0, tc.tensor(0.0), output_vector + change)

return output_vector

model_system(tc.ones(100))

Out:
tensor([1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 3.,
3., 2., 2., 1., 2., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., 2.,
1., 0., 1., 2., 2., 2., 2., 2., 1., 2., 0., 0., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 0., 2., 1.,
1., 0., 1., 1., 1., 1., 1., 1., 2., 1.])
``````

This returns nicely what I expect as output.

My problem is the following:
Given The output vector, input vector and transition probabilities, how can I find the parameter max_iter (i.e. how often did the input sequence undergo change)? How would I encode the observed output vector in pyro, since the output of model_system is not sampled directly from a distribution?

Thanks a lot!