Understanding Effective Sample Size

I was pleased seeing that both ESS and Gelman-Rubin diagnostics are available in Pyro. I have a question regarding the usage.

With reference to pyro/stats.py at master · pyro-ppl/pyro · GitHub, if I have a batch of Markov Chains (say chains) with dimension L x B x D, B is the number of batches, L is the length of each chain and D is the dimension of the support.

Is this the correct usage?

import pyro.ops.stats as stats

stats.effective_sample_size(chains, chain_dim=0, sample_dim=-1)

Weirdly, I am getting negative values for certain trials and want to make sure I understand the semantics of this routine correctly.

cc @fehiepsi (since you’ve implemented those parts.)

Is L the number of chains? If so then you are correct.

Also note that @fehiepsi has recently added documentation to these diagnostics. Feel free to submit an issue or PR to clarify the docs :wink:

L is the length of the chains. B is the batch size (number of chains).

After your comment, I tried the following. chains (50 x 1000 x 2) is a tensor containing 50 independent chains each of length 1000 from a 2-D Gaussian.

Running the routine I get a result of 1000-D tensor. Is this the right output? If I understand ESS correctly, shouldn’t we have ESS for each dimension of each chain?

Additionally, I am getting negative values for ESS. Is that expected?

Glad to help once I get this conceptually clarified! :slight_smile:

chain_dim is the dimension for your chains.
sample_dim is the dimension for your samples.
If you have a tensor L x B x D where L, B, D are interpreted as: B chains, each chain contains L samples, each sample has shape D, then chain_dim=1, sample_dim=0. We enumerate dim from left to right, as in the usual PyTorch tensor.

I guess that your batch_dim is chain_dim and your length_dim is sample_dim? I have thought that by length of the chains, you mean the number of chains.

Ok, I think I understand this. I have questions more on the conceptual side now.

Using a 1000 x 50 x 2 sized tensor which represents 50 chains containing 1000 samples each,

# chain_x: 1000 x 50 x 2
stats.effective_sample_size(chain_x, chain_dim=1, sample_dim=0)

I get the following result,

tensor([123393.8203, 126691.3281])

I was expecting a 2-D tensor (representing ESS per dimension from the sample space) and this seems to fall in line with my understanding.

However, I don’t know how to interpret these numbers. I was thinking that there is no way ESS can be greater than 1000. Am I missing something here?

Also, if I report ESS in results, do I report the min among all dimensions?

Edit: I generated the chains using my own implementation of HMC and have visualized one of them below. It seems to be doing reasonable so I am hoping at least my samples are reasonable.

This thread answers for the question why ess > num samples: bayesian - Effective Sample Size greater than Actual Sample Size - Cross Validated
It will happen when your samples are negative correlated. You can test against arviz to confirm things are calculated correctly. About ‘min’ of ess, I don’t know if it is a right way to do. :frowning2:

Thanks for that!

Is there a reading you’d recommend for these diagnostics? I still feel there’s a gap in my knowledge and don’t feel too comfortable with these statistics.

I think that Stan Manual is a good reference (in addition to Google :smiley:).

1 Like

Thanks! They had nice summaries in there.