How to handle unknown/missing/extra labels in HMM Emitter

For example, in an HMM Emitter, I have 2 labels, classA, classB. But some labels in my data are missing, I use the label MIS to indicate this situation. The indices of these three labels are 0,1,2 respectively.

When I build emitter (from hidden state to observations of these labels), I only need a softmax which output the probability distribution of 2 labels, i.e. [classA, classB]. I don’t need the probability distribution [classA, classB, MIS]

But the problem is when I use the sample function, e.g.

dist.Categorical(distribution from softmax over classA and classB).mask(the mask that masked out the missing (MIS) labels, obs=observation)

The function seems cannot handle the extra class (MIS) well, just report an error (because in my observation I have some missing data, e.g. [0,1, 2], the 2 is not valid for the sample function). Thus I have to use the softmax output a distribution over 3 classes. However the last one (MIS) is not needed for me.

Is there a better way to deal with this situation? Maybe for now I could just assign an arbitrary value within 0 or 1 for missing data because anyway they will be masked out by the corresponding mask.

Many thanks for the nice reply to the similar question from fritzo in

Thanks for linking to the github issue!