I was pleased seeing that both ESS and Gelman-Rubin diagnostics are available in Pyro. I have a question regarding the usage.

With reference to pyro/stats.py at master · pyro-ppl/pyro · GitHub, if I have a batch of Markov Chains (say chains) with dimension L x B x D, B is the number of batches, L is the length of each chain and D is the dimension of the support.

Is this the correct usage?

import pyro.ops.stats as stats
stats.effective_sample_size(chains, chain_dim=0, sample_dim=-1)

Weirdly, I am getting negative values for certain trials and want to make sure I understand the semantics of this routine correctly.

L is the length of the chains. B is the batch size (number of chains).

After your comment, I tried the following. chains (50 x 1000 x 2) is a tensor containing 50 independent chains each of length 1000 from a 2-D Gaussian.

Running the routine I get a result of 1000-D tensor. Is this the right output? If I understand ESS correctly, shouldn’t we have ESS for each dimension of each chain?

Additionally, I am getting negative values for ESS. Is that expected?

chain_dim is the dimension for your chains. sample_dim is the dimension for your samples.
If you have a tensor L x B x D where L, B, D are interpreted as: B chains, each chain contains L samples, each sample has shape D, then chain_dim=1, sample_dim=0. We enumerate dim from left to right, as in the usual PyTorch tensor.

I guess that your batch_dim is chain_dim and your length_dim is sample_dim? I have thought that by length of the chains, you mean the number of chains.

Ok, I think I understand this. I have questions more on the conceptual side now.

Using a 1000 x 50 x 2 sized tensor which represents 50 chains containing 1000 samples each,

# chain_x: 1000 x 50 x 2
stats.effective_sample_size(chain_x, chain_dim=1, sample_dim=0)

I get the following result,

tensor([123393.8203, 126691.3281])

I was expecting a 2-D tensor (representing ESS per dimension from the sample space) and this seems to fall in line with my understanding.

However, I don’t know how to interpret these numbers. I was thinking that there is no way ESS can be greater than 1000. Am I missing something here?

Also, if I report ESS in results, do I report the min among all dimensions?

Edit: I generated the chains using my own implementation of HMC and have visualized one of them below. It seems to be doing reasonable so I am hoping at least my samples are reasonable.

This thread answers for the question why ess > num samples: bayesian - Effective Sample Size greater than Actual Sample Size - Cross Validated
It will happen when your samples are negative correlated. You can test against arviz to confirm things are calculated correctly. About ‘min’ of ess, I don’t know if it is a right way to do.

Is there a reading you’d recommend for these diagnostics? I still feel there’s a gap in my knowledge and don’t feel too comfortable with these statistics.