Consider the difference between Delta and Dirichlet distributions:

```
dist1 = dist.Delta(0.5*torch.ones([2]))
dist2 = dist.Dirichlet(0.5*torch.ones([2]))
```

The Delta distribution has an event_shape=0 and a batch_shape of 2.

The Dirichlet distribution has an event_shape=2 and a batch_shape of 0.

Why is the behavior different between the two distributions? I thought that dependence of variables was assumed unless stated otherwise?

How would I create a sample off of the dist.Delta above such that the event_shape is 2? I cannot figure this out, and yet it is needed if I had a model with the Dirichlet distribution above and a guide with the Delta distribution. This is done automatically with AutoDelta, and I am trying to understand what really happens under the hood. Thanks.