Stick Breaking & General Pyro Question

Hello! I’m attempting to make a Python open-source implementation of [1502.07257] Breaking Sticks and Ambiguities with Adaptive Skip-gram and I am trying to figure out if Pyro is the correct tool for that application. The paper uses the stick breaking process and Stochastic Variational Inference to perform word sense disambiguated word vectors, which based on my searching seems to say that Pyro offers me a toolset that would make my implementation cleaner and easier to understand.

However I know relatively little about Probablistic Programming, and I’m having a hard time understanding the connections between Pyros Model and Guide Concepts and the learning process from the Julia implementation from the paper: AdaGram.jl/gradient.jl at master · sbos/AdaGram.jl · GitHub

Is Pyro a good tool for this task, or am I misunderstanding the goals of the project?

Thanks for any help and insights you may provide :slight_smile:

i’m not familiar with this paper but from what i can tell from a very quick skim is that the variational family is chosen such that all the latent variables can be integrated out exactly. so there is data-subsampling but no sampling of latent variables. in other words they are not doing variational inference of a more black-box type. currently, pyro is best suited for black-box variational inference, as it hasn’t the functionality to automatically determine which latent variables can be integrated out. a black box approach could still in principle work, but it would likely converge more slowly than the algorithm in the paper. so the pyro implementation would almost certainly be “cleaner and easier to understand” but it’s hard to say a priori how well it would work in practice for this particular model.

1 Like

Thanks so much for the insights @martinjankowiak.

I’ll continue down the pure NumPy approach I have been using rather than trying to force the nice interfaces of Pyro into a use-cases they aren’t necessarily intended for!