Even if the mean field assumption is true for the true posterior, the true posterior may not be in the approximate posterior family Q. For example, Q could be mean-field Gaussian, and P could be mean-field laplace/student-t/etc, or P could be skewed.
In this case, the authors of the paper suggest that P is skewed to the right. You can think of mean field VI wanting to really avoid areas of low density, so the mean (not just the variance) getting pushing to the right, causing bias.
Other examples include when the ‘true posterior’ is a mean-field Laplace/student-t distribution (e.g., something with heavy tails). If the family of approximate posteriors if Gaussian (as is commonly the case), I would expect the approximate posterior q to underestimate the variance of the parameters - we can never find a q with KL(q||p)=0, and variational inference solutions always tend to be ‘more compact’ than the true posterior due to KL(q||p) integrating over q and having large penalty if p is small and q is large.
You could also imagine fitting a ‘mean-field’ true posterior that is a mixture of Gaussians for each component. Then mean field VI would probably ‘select’ a mode for each variable depending on the initialisation, therefore underestimating the posterior variance.
If you wanted to recover the mean and variance correctly, and the mean-field assumption is satisfied, you’d probably be better off using EP (expectation propagation, which moment matches for exponential families), but I don’t know how this works with probabilistic programming languages.