Consider the difference between Delta and Dirichlet distributions:
dist1 = dist.Delta(0.5*torch.ones())
dist2 = dist.Dirichlet(0.5*torch.ones())
The Delta distribution has an event_shape=0 and a batch_shape of 2.
The Dirichlet distribution has an event_shape=2 and a batch_shape of 0.
Why is the behavior different between the two distributions? I thought that dependence of variables was assumed unless stated otherwise?
How would I create a sample off of the dist.Delta above such that the event_shape is 2? I cannot figure this out, and yet it is needed if I had a model with the Dirichlet distribution above and a guide with the Delta distribution. This is done automatically with AutoDelta, and I am trying to understand what really happens under the hood. Thanks.