I’m working on a problem with two large datasets where strings are mapped to numbers. In other words, it’s a supervised classification task where I want to learn a function to map strings to numbers, and I have a few hundred thousand examples to learn from.
In dataset 1, the number is a bit (0/1) like in typical binary classification problems. In dataset 2, it’s a real number from 0 to 1. The bit from dataset 1 is actually experimental uncertainty. The technique couldn’t give the exact real, which for bit 1 is something distributed in the interval 0.9-1 (and can be modeled), and for bit 0 is some wider distribution with a much smaller average.
I already have a deep network that works well for dataset 2.
I plan to extend this network to become a deep probabilistic one, with two outputs, mu and sigma as parameters for an observation, now modeled as a normal. For training data from dataset 2, I would supply a small sigma. An immediate advantage is that I can get uncertainty in predictions.
This would also allow me introducing dataset 1, where I would map 1 to a normal whose mean is close to 1, but a much bigger sigma than for dataset 2.
Is this approach correct? I couldn’t find a lot of literature on uncertainty in observations.