Here are three rules you can apply to simplify reasoning about independence annotations in Pyro:
- Every random variable is assumed to depend on all previously sampled random variables unless Pyro is informed otherwise.
- the joint distributions of all random variables in each slice of a
plate
context (i.e. a loop iteration or a slice along the plate dimension) are assumed to be conditionally independent given all previous random variables in all enclosing plate slices. plate
s used as context managers and not given a value forplate(..., dim=...)
allocate new batch dimensions on the left when they are entered.
It might also be helpful to do a bit of background reading on plate notation in graphical models, from which the semantics of pyro.plate
is derived.
Can a[0] and b[0] be treated as independent during inference? My guess is no.
No, a[0] and b[0] are in the same slice (0) and by the first rule above b[0] is assumed to depend on a[0].
Can a[0] and a[1] be treated as independent during inference? My guess is yes.
Yes, by the second rule above a[0] and a[1] are in different slices and are therefore independent.
Can a[0] and c[0] independent as during inference? My guess is yes
No, because by the third rule, the leftmost dimension of c
corresponds to my_plate2
, but applying the second and third rules above, a[1] and c[:, 0] can be, and a[0] and c[:, 1] can be.
If I want to declare to Pyro that 2 Bernoulli random variables can be treated as independent during inference, are the following 3 scenarios equivalent:
I’m not sure what you mean by equivalent, but these will all behave differently. The first version has two sample statements that are marked as independent of one another, the second is a vectorized version of the first that has a single sample statement that is marked as independent along the leftmost dimension, and the third has two sample statements where the second depends on the first just as if there were no plate
s used, because they have no batch_shape
s and the plate
s have no size
s.
From an underlying implementation perspective, is there any truth to what I was saying about Pyro using the batch dimensions across sample statements as the way it tracks what it can treat as independent?
Sort of - you can think of a vectorized plate
context as associating a batch dimension of all sample statements that appear within it.
Is it true that the call to .independent() is a bit of a different construct than plate
.independent()
is a somewhat unforunate name originally drawn from TensorFlow distributions that we’re looking to change. What it does is declare dimensions dependent, i.e. move dimensions from the batch_shape
of a distribution to its event_shape
. See the tensor shape tutorial for more details.