 # Text generation Markov Model

Hi everyone,
I am trying to implement a basic Markov model for predicting next character (english letter) generation, given previous character (pairwise model).
The format of the data is an array of ordinal values (0-25) for the given character and have to probabilistic way to generate the next.
`P(c2|c1)` where c1 is the given character and c2 is the following character
Although I understand this can done by counting and frequency generation, buy I am learning pyro so trying to understand how to setup this model so I can build upon it to create complex models.
Question:

1. Is it necessary for me to convert the data to a tensor of shape (26,26) with counts in there to get things setup or can the model be designed to learn with one row at a time.
2. Assuming the count matrix is setup, does the below code make sense:
``````num_characters = 26
def model(counts):
next_ch_probs = pyro.sample('next_ch_probs', dist.Dirichlet(torch.ones(num_characters,num_characters)/num_characters))
pyro.sample('counts', dist.Multinomial(26*26, next_ch_probs), obs=counts)
``````
1. If you have a better way of framing the problem or an example to share, please do. I am trying to learn.

Hmm I think you’ll want to fit a plate full of multinomials

``````num_characters = 26
def model(counts):
next_ch_probs = pyro.sample(
'next_ch_probs',
dist.Dirichlet(torch.ones(num_characters, num_characters) / num_characters),
)
with pyro.plate("characters", num_characters):
pyro.sample(
'counts',
dist.Multinomial(probs=next_ch_probs, validate_args=False),
obs=counts,
)
``````

where the `validate_args=False` works around `Multinomial`'s lack of support for heterogeneous counts.