Does it make sense to use Pyro for my problem?

aravantv · May 9, 2019, 7:04am

Dear Pyro gurus,

I am not a big statistician so pardon my question if naive.

I have the following setting (abstracted away):

a Bernoulli variable z
whose value typically depends on a variable x (with very many dimensions)
I have lots of observations (x,z)
I would like to estimate the parameter p of z

From what I recently discovered, that could be a case for “Binomial proportion confidence interval”, which however normally does not consider the possibility of leveraging knowledge about an “x” like in my case. Actually x turns out to have a very complex structure (very many dimensions) with lots of dependencies between the various dimensions, which I could nicely express using probabilistic programming. Hence my thought “hey maybe I could use this cool stuff called Pyro for it?”. Now I don’t know if it’s relevant/overkilling/useful.

I would love to hear your opinions about this!

Jameson · May 11, 2019, 3:45pm

Sounds as if you should try logistic regression or logistic regression with LASSO, first. If you care about predictive accuracy rather than interpretability, maybe use ridge or elastic net rather than LASSO.

Pyro seems like overkill in this case, unless there’s something beyond what you’re telling us. You could use Pyro to do any of the above things, but unless you will later need flexibility to expand the model, there are probably other tools that would be simpler.

aravantv · May 13, 2019, 7:07am

Hi Jameson, thanks for your answer! I probably badly modelled my problem, which then indeed makes it look like I want to do a logistic regression… I will describe my problem differently.

Imagine I have a coin of which I want to estimate the parameter p of falling on heads or tails. I am particularly interested in the confidence of this estimation. So far nothing special, I can do this using either frequentist approaches with some confidence intervals, or using some bayesian approaches.

Now assume additionally that I can model the process of throwing the coin, e.g., depending on the angle of my articulations when I flip the coin (and possibly even more: the adherence of my finger, the wind in the room, the humidity of my skin, etc.) I expect this additional knowledge to influence my a posteriori distribution (or my confidence interval): if my samples were all thrown with the same position of my hand, then I have a higher uncertainty than if I threw my samples with lots of different hand positions.

A logistic regression would allow me to estimate, given the position of my hand, the probability of the coin to land on head or tail. But it’s actually not what interests me: I would still like to know the “general” probability of falling on head or tail (i.e., for all hand positions), but I expect the knowledge of the hand position to inform me about the representativeness of my sample, and thus to potentially improve my confidence in my overall estimation.

Is it a bit better explained? Sorry for my unclarity…

aravantv · May 13, 2019, 7:11am

And I was thinking of using pyro because I want to model the process of throwing the coin with a high level of details, which made me feel that probabilistic programming was the ideal modelling tool.

Jameson · May 13, 2019, 3:58pm

It sounds as if you want to model, rather than just conditioning on, the distribution of covariates? In that case, pyro could definitely be a good tool for you.

But I’m still not sure if that’s what you’re saying. It doesn’t ultimately matter whether I’m sure, but for your own sake, you yourself will have to be very clear about why you’re using pyro.

aravantv · May 13, 2019, 8:39pm

Exactly, I’d like to model the distribution of the covariates as well.

I agree with me needing to be sure about pyro. That’s why I ask the question, cause I’m really a bit blurry at the moment.

I think I’ll need to do more research then. Thanks a lot for your answers, they helped me in knowing in which directions to look for answers!