Score function estimator with discrete latent variables

ffp · August 10, 2021, 8:23am

Dear all,

I have a more “formal” question this time.

By reading the tutorials on SVI, I was convinced that score function estimators can be used also in presence of discrete latent variables z. Is it right?

However, by looking in more detail at the derivation of the estimator, I noticed that it requires that the variational distribution q_\phi(z) must be differentiable w.r.t. to its parameters \phi. Thus, I’m wondering (sorry for my lack of knowledge) if a discrete distribution can be differentiated w.r.t. its parameters. Is it possible? And if yes, what happens when q_\phi(z) is a categorical distribution? Can I use this kind of estimator when the variational distribution has this form?

Thank you so much.

fritzo · August 11, 2021, 5:05pm

Hi @ffp,

Yes, score function estimators can be used in the presence of discrete latent variables. While Pyro supports those estimators out of the box, they often lead to high variance estimators and slow- or non-convergence. Some special cases where the variance may be low are when (1) you have many independent discrete latent variables at the very end of a model (just before a likelihood) and hopefully they share all or some parameteres, or (2) you construct a good baseline e.g. using TraceGraph_ELBO. Because in most cases gradient variance is high, Pyro tends to focus on enumeration strategies (aka Rao-Blackwellization aka marginalization aka collapsing), which are more technically involved by which lead to lower variance gradient estimators.

Note this is fine: you just need to be able to differentiate q wrt the continuous parameters \phi, not wrt the discrete random variables z. For example in a categorical distribution, you need to be able to differentiate Categorical(probs).log_prob(value) wrt probs for a fixed discrete value.

ffp · August 12, 2021, 7:28am

Thank you so much, Fritz