Supervised HMM with informative priors

Hi, I’m new to pyro and Bayesian models. I’m trying to experiment with it starting from the model0 in https://pyro.ai/examples/hmm.html.

I’m working with music data in a format similar to the one in the example. The main difference is that I want to work with a “supervised” HMM, meaning that during the training phase I know also the hidden states. In order to do that I changed the code, using “obs=…” instead of “infer=…” in
x = pyro.sample("x_{}_{}".format(i, t), dist.Categorical(probs_x[x]), infer={"enumerate": "parallel"})

Now, if the premise is correct, here is my question: I would like to use some very specific priors, i.e. to give to assign as prior to each note produced from each hidden status a Beta distribution with specific parameters.
How can I do that with pyro? The tutorial uses the function expand() in
probs_y = pyro.sample("probs_y", dist.Beta(0.1, 0.9).expand([args.hidden_dim, data_dim]) .to_event(2))
but in this case all Beta will have the same parameters.

Thank you

you just need to instantiate a Beta distribution of the right shape; something like

alpha = 0.5 + torch.rand(args.hidden_dim, data_dim)
beta = 0.5 + torch.rand(args.hidden_dim, data_dim)
probs_y = pyro.sample("probs_y", dist.Beta(alpha, beta).to_event(2))
1 Like

Thank you very much.

Another question that is still in line with the title of the post: in the comments in the example they say:

we’ll set a rough weak prior of 10% of the notes being active at any one time.

And they use as parameters for the beta distribution: alpha=0.1,beta=0.9.

I’m curious to understand if there is a strong motivation for this choice, instead of what I see more commonly used in example: parameters > 1, e.g. alpha=2 and beta = 20 ?

there is not any super strong motivation for this. the mean of the beta distribution is alpha/(alpha+beta) [here = 0.1] and the variance is given by some more complex formula, which in this cases evaluates to be moderately small (0.045) but reasonable given the mean. believe your choice would lead to an even smaller variance (and thus be more informative)

1 Like