Marginal probabilities from HMM Viterbi

I am currently using a viterbi_decoder function to infer the discrete states of my HMM. My code is similar to the function given in the infer discrete documentation.

@infer_discrete(first_available_dim=-1, temperature=0)
def viterbi_decoder(data, hidden_dim=10):
    transition = 0.3 / hidden_dim + 0.7 * torch.eye(hidden_dim)
    means = torch.arange(float(hidden_dim))
    states = [0]
    for t in pyro.markov(range(len(data))):
                    dist.Normal(means[states[-1]], 1.),
    return states  # returns maximum likelihood states

Is there any way to change up viterbi_decoder() so that I can extract the posterior marginal probabilities for each t in the data? My current ideas are to use the Marginals class or to edit the _sample_posterior() function so that it returns log_probs after performing forward-backward or Viterbi-like MAP; however, writing viterbi/forward-backward algo from scratch myself seems more straightforward than these two hacks. Any advice for keeping this all in pyro?


Hi @aweiner, I would recommend either Monte Carlo estimating the marginals by drawing samples, or using TraceEnum_ELBO.compute_marginals().

Thanks @fritzo! I was able to use compute marginals with the following one-liner:

marginals = elbo.compute_marginals(model, guide, sequences, lengths, args)

When calculating the states (using argmax of marginals), I noticed that there was ~97% concordance with the states assigned by my viterbi_decoder() function above. I know these are different sampling methods but any chance you can explain why this discontinuity exists? Similarly, which set of assigned states would be most accurate for a hidden Markov model? I’ve always assumed Viterbi would be best but I can’t see what would be wrong with the compute_marginals() approach if my model+guide form an HMM.

Hi @aweiner, we should expect a difference between .compute_marginals() and Viterbi (via @infer_discrete(temperature=0)) because the former generates marginal distributions whereas the latter produces hard maximum-a-posterior values of discrete latent variables.

which set of assigned states would be most accurate for a hidden Markov model?

.compute_marginals() will generate more accurate distributions of individual latent variables. On the other hand, if you want to access joint distributions among different latent variables, Viterbi or other decoders would perform well. You might also try infer_discrete(temperature=1), which should generate posterior samples.