# Automated model and guide generation (e.g. for multivariate Bernoulli distribution)

Hi

Please, correct me if I am wrong. From the far perspective the primary two things that happen behind the scene with Pyro is:

1. Judgments about joint distribution of observable (`x`) and the non-observable (`z`) latent random variables. These judgments are done in terms of model-functions

2. Judgments about conditional distributions `p(x|z)` and `p(z|x)`. These judgments are done in terms of guide-functions for the posterior p(z|x). It looks like no special entities are introduced in Pyro for `p(x|z)` though.

It looks like both the model and the guide are rather expected currently to be written by hand.
Q: Are there any code for generating model and guides in some automated way?

For example, I often get OHE-encoded data (or categorical data, which are OHE-encoded). As result our observable data is a multivariate Bernoulli distribution with the samples represented as a boolean matrix N*M, where the N is the number of observations and the M is the number of boolean variables.

So, why not to try to assess the model not by hand, but by some automated process?

Also, Bernoulli distribution has a great property: if the prior is expressed as a Beta(a,b) than the posterior is also Beta with the known a and b according to the outcome of the new observation.

So, if weâ€™ve got a boolean OHE data matrix basically without any additional valuable prior knowledge, why canâ€™t we just start with some â€śboilerplatedâ€ť models and guides, which are not created manually, but created in some way natural for the Bernoulli multivariate variables based on the data provided?

Iâ€™d love to see something like `model_builder(strategy)` and `guide_builder(strategy)` in `pyro.distributions.bernoulli` to pre-build some reasonable models and guides.

kind regards,
Valery

Hi @neurosurg, Iâ€™m not sure I fully understand your point about generating models automatically. Can you clarify what exactly the inputs and outputs of `model_builder` are and what their types are?

In the meantime, you might be interested in brmp, a port to Pyro/NumPyro of (a subset of) the `brms` specification language of R-style formulas for Bayesian generalized linear mixed models.

I think thatâ€™s probably closest to your `model_builder` idea, in that complete Pyro models are generated from a high-level specification, although individual variables still need to have distribution types specified. Unfortunately we havenâ€™t had the developer bandwidth to add features to it lately, but the code thatâ€™s there is fairly stable and contributions are welcome.

As for automatically generating guides, itâ€™s important to understand first that only a very special subset of Bayesian models have posterior distributions that can be computed exactly and represented concisely in closed form as generative Pyro programs.

Outside of this subset the choice of approximate posterior distribution for variational inference can be quite subjective, and a classical mean-field approximation may not be the best choice even when one is fully determined by conjugacy relations in the model.

As a consequence of this subjectivity, there are a number of ways in Pyro to automatically generate and represent guides. `pyro.infer.autoguide` provides a few different strategies for automatically generating variational distributions for arbitrary fixed-structure models (perhaps the closest thing to your `guide_builder` idea), and `pyro.contrib.easyguide` is a handy tool for composing and mixing them with handwritten guides. `pyro.infer.infer_discrete` draws exact samples from joint posterior distributions over discrete latent variables using a message-passing algorithm. We are also working on an intermediate language Funsor that will, among other things, make static analysis of Pyro models easier.