For example, in an HMM Emitter, I have 2 labels, classA, classB. But some labels in my data are missing, I use the label MIS to indicate this situation. The indices of these three labels are 0,1,2 respectively.
When I build emitter (from hidden state to observations of these labels), I only need a softmax which output the probability distribution of 2 labels, i.e. [classA, classB]. I don’t need the probability distribution [classA, classB, MIS]
But the problem is when I use the sample function, e.g.
dist.Categorical(distribution from softmax over classA and classB).mask(the mask that masked out the missing (MIS) labels, obs=observation)
The function seems cannot handle the extra class (MIS) well, just report an error (because in my observation I have some missing data, e.g. [0,1, 2], the 2 is not valid for the sample function). Thus I have to use the softmax output a distribution over 3 classes. However the last one (MIS) is not needed for me.
Is there a better way to deal with this situation? Maybe for now I could just assign an arbitrary value within 0 or 1 for missing data because anyway they will be masked out by the corresponding mask.