Even if the mean field assumption is true for the true posterior, the true posterior may not be in the approximate posterior family Q. For example, Q could be mean-field Gaussian, and P could be mean-field laplace/student-t/etc, or P could be skewed.

In this case, the authors of the paper suggest that P is skewed to the right. You can think of mean field VI wanting to really avoid areas of low density, so the **mean** (not just the variance) getting pushing to the right, causing bias.

Other examples include when the ‘true posterior’ is a mean-field Laplace/student-t distribution (e.g., something with heavy tails). If the family of approximate posteriors if Gaussian (as is commonly the case), I would expect the approximate posterior q to *underestimate* the variance of the parameters - we can never find a q with KL(q||p)=0, and variational inference solutions always tend to be ‘more compact’ than the true posterior due to KL(q||p) integrating over q and having large penalty if p is small and q is large.

You could also imagine fitting a ‘mean-field’ true posterior that is a mixture of Gaussians for each component. Then mean field VI would probably ‘select’ a mode for each variable depending on the initialisation, therefore underestimating the posterior variance.

If you wanted to recover the mean and variance correctly, and the mean-field assumption is satisfied, you’d probably be better off using EP (expectation propagation, which moment matches for exponential families), but I don’t know how this works with probabilistic programming languages.