What is the difference between these two pieces of code? And what is "fn":"Indepent()"

isforalan · February 28, 2019, 2:54am

Issue Description

What is the difference between these two pieces of code（model1 and model2）? I feel that the second piece of code has defaulted to the batch_shape is independent, does not require a plate to declare.
I also found that inside the model function, if a variable calls the to_event() function, after the trace function is converted, the value of fn is Indepent() instead of the corresponding distribution function, e.g. Normal().And when I do not use to_event() function, the value of ‘fn’ is the corresponding distribution function.

Code Snippet

(For the first question)
def model1():
with pyro.plate(“d_plate”, 3):
dd = pyro.sample(“dd”, Normal(torch.zeros(3,4), 1).to_event(1))

def model2():
    dd = pyro.sample("dd", Normal(torch.zeros(3,4), 1).to_event(1))

(For the second question)
pyro.clear_param_store()
trace2 = poutine.trace(model2).get_trace()
trace2.nodes[“dd”]

then will print：
{‘type’: ‘sample’,
‘name’: ‘dd’,
’fn’: Independent(), # Why and What is Independent()? Why is it not normally distributed?
‘is_observed’: False,
‘args’: (),
‘kwargs’: {},
‘value’: tensor([[ 0.3348, 0.5870, -0.6854, -0.6906],
[-0.3144, -0.2633, -0.0274, 1.6625],
[-1.4609, 0.0729, -0.4075, -2.3067]]),
‘infer’: {},
‘scale’: 1.0,
‘mask’: None,
‘cond_indep_stack’: (),
‘done’: True,
‘stop’: False,
‘continuation’: None}

neerajprad · February 28, 2019, 7:33am

I would refer you to this tutorial that talks about distribution shapes in much more detail. Without getting into too much details here:

What is the difference between these two pieces of code（model1 and model2）? I feel that the second piece of code has defaulted to the batch_shape is independent, does not require a plate to declare.

In the first case, you have declared your leftmost dim independent. Inference algorithms would expect all your batch dims to either be accounted for by a plate or a .to_event. So you will get an error or a warning when you run your second model since it only accounts for the rightmost batch dim (you should set pyro.enable_validation(True) to catch such shape bugs). plate is a much more general construct to denote conditional independence, and also can do subsampling and implicit broadcasting.

I also found that inside the model function, if a variable calls the to_event() function, after the trace function is converted, the value of fn is Indepent() instead of the corresponding distribution function, e.g. Normal().And when I do not use to_event() function, the value of ‘fn’ is the corresponding distribution function.

The distribution is wrapped by a torch.distributions.Independent instance which interprets some of the batch dims of a distribution from the right as event dims.

isforalan · February 28, 2019, 9:48am

thank you for your reply! I have doubts about this sentence. Because you are talking about “a plate or a.to_event.”, my second model has a call to .to_event., so it satisfies the requirements of the inference algorithm. So why do you say my second model will get an error or a warning?

P.s. I have read the documentation in detail, but I still can’t solve the above problem.

isforalan · February 28, 2019, 10:15am

By looking at the documentation, I summarize the use of the plate.
Under the plate, the distribution should be univariate (event_shape = (), batch_shape = ()),
e.g,
~~with pyro.plate(“components”, num_components):

 or completely multivariate (event_shape = (...), batch_shape = (), i.e. batch_shape is empty), 
e.g.
def model1():
~~~with pyro.plate("d_plate", 3):
~~~~~ dd = pyro.sample("dd", Normal(torch.zeros(3,4), 1).to_event(**2**))
in this case, the role of the plate is equivalent to the expand function.
but if the distribution is a batch of distribution (event_shape and batch_shape both are not empty, but the length of batch_shape must be equal to 1, e.g. batch_shape = [7] ), at this time the parameter size of the plate must be equal to batch_shape(e.g. with pyro.plate("d_plate",7): ), otherwise it will report an error.
e.g.
def model1():
~~~with pyro.plate("d_plate", **3**):
~~~~~ dd = pyro.sample("dd", Normal(torch.zeros(**3**,4), 1).to_event(**1**))
the reason must be equal, I think it is because the distribution parameter shape already has a non-empty batch_shape, does not need to add a batch_shape to it via the plate. In other words, the role of the plate is to add a non-empty batch_shape to the distribution. (Of course I know that plate has other functions, I will not mention it here)
So even without a plate, the following model is completely feasible I think
def model1():
~~~dd = pyro.sample("dd", Normal(torch.zeros(**3**,4), 1).to_event(**1**))

neerajprad · February 28, 2019, 7:08pm

For the second model, you are using .to_event(1) which will only account for the rightmost batch dim of size 4 but not the left dim. If you changed it to .to_event(2), then you shouldn’t see any errors/warnings when you run inference. It is not a question of “feasibility”, this constraint makes it possible (amongst other things) for Pyro to make certain simplifying assumptions in the backend without compromising on expressivity. You could think of it as model specification syntax. You also will be able to run model 2 using NUTS without any warnings if you do not enable validation, so it is certainly “feasible” in that regard. It is not recommended however, since with more complicated models, it is easy to get the batch shapes wrong and end up with an incorrect result.

PS: Please consider putting your code inside code blocks to make your post more readable.

isforalan · March 1, 2019, 1:36am

OK, I think I understand what you mean, thank you, you are a patient person.