How to implement one-vs-each approximation for big softmax

Hi, I’m new to pyro and struggling to figure out whether pyro is appropriate to replicate Francisco J. R. Ruiz, Susan Athey and David M. Blei’s paper ( ‘SHOPPER: A PROBABILISTIC MODEL OF CONSUMER CHOICE WITH SUBSTITUTES AND COMPLEMENTS’.

Now I’m stuck in one specific question about the inference of the model, the one-vs-each bound to approximate a big softmax. (page 31)

where eta is at page 10 (Sorry, new user is only allowed to post one figure, so I combine it to the right.)

Although within this context, I believe what is important is how to do this in pyro?
(the one-vs-each bound is introduced in paper ‘One-vs-each approximation to softmax for scalable estimation of probabilities’

Usually we will do:

pyro.sample(“eta”, dist.Categorical(logits), obs=data)

but with one-vs-each bound, we may do something like (I’m not sure):

pyro.sample(“eta”, dist.Binomial(logits), obs=data)

So, what should we use to scale the negative-sampled loss?
Should it be with pyro.plate()? Or shall we customize the ELBO?

I hope I make myself understood. Any help is appreciated~