Automated model and guide generation (e.g. for multivariate Bernoulli distribution)

neurosurg · October 4, 2020, 1:47pm

Hi

Please, correct me if I am wrong. From the far perspective the primary two things that happen behind the scene with Pyro is:

Judgments about joint distribution of observable (x) and the non-observable (z) latent random variables. These judgments are done in terms of model-functions
Judgments about conditional distributions p(x|z) and p(z|x). These judgments are done in terms of guide-functions for the posterior p(z|x). It looks like no special entities are introduced in Pyro for p(x|z) though.

It looks like both the model and the guide are rather expected currently to be written by hand.
Q: Are there any code for generating model and guides in some automated way?

For example, I often get OHE-encoded data (or categorical data, which are OHE-encoded). As result our observable data is a multivariate Bernoulli distribution with the samples represented as a boolean matrix N*M, where the N is the number of observations and the M is the number of boolean variables.

So, why not to try to assess the model not by hand, but by some automated process?

Also, Bernoulli distribution has a great property: if the prior is expressed as a Beta(a,b) than the posterior is also Beta with the known a and b according to the outcome of the new observation.

So, if we’ve got a boolean OHE data matrix basically without any additional valuable prior knowledge, why can’t we just start with some “boilerplated” models and guides, which are not created manually, but created in some way natural for the Bernoulli multivariate variables based on the data provided?

I’d love to see something like model_builder(strategy) and guide_builder(strategy) in pyro.distributions.bernoulli to pre-build some reasonable models and guides.

kind regards,
Valery

eb8680_2 · October 5, 2020, 3:59am

Hi @neurosurg, I’m not sure I fully understand your point about generating models automatically. Can you clarify what exactly the inputs and outputs of model_builder are and what their types are?

In the meantime, you might be interested in brmp, a port to Pyro/NumPyro of (a subset of) the brms specification language of R-style formulas for Bayesian generalized linear mixed models.

I think that’s probably closest to your model_builder idea, in that complete Pyro models are generated from a high-level specification, although individual variables still need to have distribution types specified. Unfortunately we haven’t had the developer bandwidth to add features to it lately, but the code that’s there is fairly stable and contributions are welcome.

As for automatically generating guides, it’s important to understand first that only a very special subset of Bayesian models have posterior distributions that can be computed exactly and represented concisely in closed form as generative Pyro programs.

Outside of this subset the choice of approximate posterior distribution for variational inference can be quite subjective, and a classical mean-field approximation may not be the best choice even when one is fully determined by conjugacy relations in the model.

As a consequence of this subjectivity, there are a number of ways in Pyro to automatically generate and represent guides. pyro.infer.autoguide provides a few different strategies for automatically generating variational distributions for arbitrary fixed-structure models (perhaps the closest thing to your guide_builder idea), and pyro.contrib.easyguide is a handy tool for composing and mixing them with handwritten guides. pyro.infer.infer_discrete draws exact samples from joint posterior distributions over discrete latent variables using a message-passing algorithm. We are also working on an intermediate language Funsor that will, among other things, make static analysis of Pyro models easier.

Is any of that helpful for what you had in mind?