Dear all,
I am very much a beginner in probabilistic programming. My problem comes from genetics and I hope it might have a nice solution in pyro, as it can not, to my knowledge, be solved analytically (for larger data).
A formulation:
I have a sequence of n genetic sites, each site having the value one. This sequence now undergoes multiple changes. At every iteration, each sequence site can change by adding or subtracting a 1 or leaving the site unchanged. However, changes are not independent between sites, but correlate with changes at neighboring sites (which I model with a Markov Chain). Now, given an input sequence and an output sequence, I would like to know, how many iterations have happened.
Here is my attempt:
import torch as tc
import warnings; warnings.simplefilter("ignore", FutureWarning)
import pyro
import pyro.distributions as dist
from pyro.infer.mcmc import MCMC, NUTS
pyro.set_rng_seed(1)
I first build a function from which I can sample a markov chain. The output is the change of my sequence in each iteration:
First defining the transition probabilities
transition_prob = tc.tensor([[0.7,0.3,0.0],[0.025,0.95,0.025],[0.0,0.3,0.7]])
Defining the MC and sampling an example of length 20:
def mc(transition_prob, length):
category = tc.tensor(1) # start with one category
final_markov_chain = []
for t in range(length+1):
if t > 0:
category = pyro.sample("category_{}".format(t), dist.Categorical(transition_prob[category]))
final_markov_chain.append(category)
return ((tc.stack(final_markov_chain))-1).float()
mc(transition_prob, 20)
Out:
tensor([ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,
0., 0., 0., 0., -1., -1.])
So the last two sites would be changed with this mc output and at this iteration. Now i model a system that receives an input and conducts these changes ‘max_iter’ times. max_iter is sampled from a Poisson distribution
def model_system(inp):
poisson_prior = tc.tensor(4.0) #pyro.sample("poisson_prior", dist.HalfNormal(6.0,3.0))
max_iter = pyro.sample("poisson", dist.Poisson(poisson_prior))
iteration = 0
output_vector = inp.clone()
while iteration <= max_iter:
iteration +=1
change = mc(transition_prob, length = inp.shape[0])
#as an additional tweak, sites that have the value zero are stuck at zero
output_vector = tc.where(output_vector == 0, tc.tensor(0.0), output_vector + change)
return output_vector
model_system(tc.ones(100))
Out:
tensor([1., 0., 0., 0., 0., 0., 0., 0., 1., 1., 1., 1., 1., 1., 1., 1., 1., 3.,
3., 2., 2., 1., 2., 2., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 1., 2., 2., 2., 2., 2., 2.,
1., 0., 1., 2., 2., 2., 2., 2., 1., 2., 0., 0., 1., 1., 1., 1., 1., 1.,
1., 1., 1., 0., 0., 0., 1., 1., 0., 1., 1., 1., 1., 1., 1., 0., 2., 1.,
1., 0., 1., 1., 1., 1., 1., 1., 2., 1.])
This returns nicely what I expect as output.
My problem is the following:
Given The output vector, input vector and transition probabilities, how can I find the parameter max_iter (i.e. how often did the input sequence undergo change)? How would I encode the observed output vector in pyro, since the output of model_system is not sampled directly from a distribution?
Thanks a lot!