Incorporating uncertainties on observations (x_i's)

I want to get feedback on an idea I have and see if others have already thought about this and have ideas. I am pretty new to Pyro and working on getting my first variational Bayesian logistic regression model to give reasonable results. This post is looking ahead a bit to something I would like to do in Pyro.

I have a gig economy use case in which different feature observations have wildly different statistical uncertainties. For this use case, my features are the fraction of jobs that a contractor has done that have an occurrence of given condition, like showing up late, for example. If the contractor has done 100 jobs, that feature is well-measured. If they have only done one job, it is very poorly measured, yet if they had a bad outcome on that first job, it is still valuable information and I would like to include it.

I propose that I need to have not only my model parameters be distributions, but also my observations. I believe that this is needed both in training/inferring the model parameters, and when making predictions.

In the inference phase, I believe that it is similar to weighting my samples by their uncertainty, but would be accomplished by making multiple samples from each observation.

When making predictions, one would want to sample from the feature distributions (x_i’s) as well as the parameter distributions with the goal of getting the mean and standard deviation that include the uncertainty from the features of the specific observation as well as the model parameters.

Your thoughts on this? Can anyone point me to previous work on this? I have not found anything on the web on this topic.

Consider modeling your observations as Binomials, rather than fractions. Binomials should naturally account for increased certainty as total_count increases.

Thanks fritzo. I would then use the number of observations (ie., number of jobs the contractor has done) as the number of Bernoulli trials and the fraction of bad outcomes that the contractor had as a proxy for the probability of a bad outcome.

Hi thedudeabides,
I am also very interested in this kind of problems, and am trying to code a similar model using pyro, with input as random variables. In my case, they are normal variable.
Did you manage to do it ? Do you have some relevant observations or obstacles you encountered that could help ?
Thank you !

Hi pradogusto,

I got busy with other stuff and set this aside, but plan to revisit it. Where I left off, I was playing around with using a binomial, as Fritz suggested, for the input features.

I’m sorry I don’t have more insight to offer yet, but I will let you know what I find when I revisit this. Let me know what you find in the meantime.