I want to get feedback on an idea I have and see if others have already thought about this and have ideas. I am pretty new to Pyro and working on getting my first variational Bayesian logistic regression model to give reasonable results. This post is looking ahead a bit to something I would like to do in Pyro.
I have a gig economy use case in which different feature observations have wildly different statistical uncertainties. For this use case, my features are the fraction of jobs that a contractor has done that have an occurrence of given condition, like showing up late, for example. If the contractor has done 100 jobs, that feature is well-measured. If they have only done one job, it is very poorly measured, yet if they had a bad outcome on that first job, it is still valuable information and I would like to include it.
I propose that I need to have not only my model parameters be distributions, but also my observations. I believe that this is needed both in training/inferring the model parameters, and when making predictions.
In the inference phase, I believe that it is similar to weighting my samples by their uncertainty, but would be accomplished by making multiple samples from each observation.
When making predictions, one would want to sample from the feature distributions (x_i’s) as well as the parameter distributions with the goal of getting the mean and standard deviation that include the uncertainty from the features of the specific observation as well as the model parameters.
Your thoughts on this? Can anyone point me to previous work on this? I have not found anything on the web on this topic.