I am currently a Computer Science student at Columbia University and I lead the uni’s chapter of Codeprentice, which is an organization of students that want to contribute to open-source projects. This semester a couple of us will take a class on probabilistic programming that uses Pyro and we thought it would be really cool if we could help develop the library further with some students from the club. We could certainly help with what’s on the issues page in Github but I thought it would be good if I asked first if there were any project ideas you guys had in mind that a team of 3-4 students could tackle throughout the semester. We would be more than willing to help with whatever you guys think is relevant and where we could be useful.
Thanks for offering to help! Codeprentice looks like a great organization.
Here are a few project ideas whose suitability may vary depending on your team’s interests, background, free time and the degree to which the work must be independent (e.g. because you need to hand something in for a grade). I’m happy to talk more about any of these as necessary.
Examples and tutorials
The most valuable type of contribution to almost any open-source project is documentation, whether in the form of a general how-to guide, an example script or Jupyter notebook, or source code documentation. It’s also the easiest way to get started contributing to Pyro and familiarize yourself with the language.
My recommendation here would be to pick a paper you enjoyed or read for a class, try to reproduce the important tools and results, and turn it into a tutorial. For example, I believe both this tutorial on Dirichlet process mixture models and this one on boosting black box variational inference were the output of university course projects. You can ask your instructor for ideas if you’re having trouble narrowing it down, or open a feature request issue if you want any advice from us before starting.
Another more ambitious project idea along these lines would be one or more tutorials on causal inference from observational data in Pyro, particularly if this is something you’ll end up covering in your class.
We also welcome new features, such as new distributions, inference algorithms or diagnostic tools. It can be difficult to evaluate the viability of inference algorithms for implementation as general-purpose tools, so I suggest opening a feature request issue on GitHub before spending time on coding.
One direction I would personally like to see explored is the use of higher-order optimizers for stochastic variational inference. This paper is a great starting point - if the impressive empirical results there held up for typical model/autoguide pairs in Pyro, it could mean a large speedup across much of Pyro’s VI machinery. Concretely, this might take the form of some new optimizer implementations and a tutorial.
There are a number of ongoing projects within Pyro and the community that are bigger than a single feature request and could benefit from some extra attention. Some examples are NumPyro (a JAX-backed Pyro implementation maintained by @fehiepsi and others), @stefanwebb’s normalizing flow library in Pyro (invertible neural networks for representing probability distributions), brmp (Bayesian generalized linear models), Funsor (an intermediate language for probabilistic programming maintained by @fritzo, myself and others) and TyXe (easier Bayesian neural network construction, by @karalets ).
I’m sure the leaders of any of these projects would be more than happy to help you start contributing. I’m personally most excited about NumPyro, while my sense is that the community would really appreciate a fully functional version of TyXe (which is very well-designed but needs tests, documentation and examples).
Nuts and bolts
Finally, there are always bugfixes or performance improvements to be made, generally found under the “help wanted” label on GitHub issues of Pyro and other projects.
Thank you for your detailed answer @eb8680_2 I really appreciate it.
The paper you mentioned seems like a great starting point and somewhat of a longer project. Given your suggestion, I think that the roadmap we would follow now would be to all provide an example/tutorial like the ones you suggested, and then once we do that we can start working on the paper. Although we are working in the class, this is really an independent project so we are not constrained by grades or anything like that. Again, thank you for your help and supportiveness. Hopefully, we will have some updates soon!