Add priors knowledge to proLDA

Hi there,

I’m trying to work with tutorial proLDA.
It works well with my own data.
Currently I have some knowledge about beta. For example I know some topics with some popular works in these topics.
topic1: {Investment, Loan, Mortgage, Financial, Services, …}
topic2: {athletics, arena, beat, award, captain, …}

Could you please guide me how to encode these to the model
It seem change to semi-supervised to get beta here but don’t know how to do this with Pyro.

class Decoder(nn.Module):
    # Base class for the decoder net, used in the model
    def __init__(self, vocab_size, num_topics, dropout):
        super().__init__()
        self.beta = nn.Linear(num_topics, vocab_size, bias=False)
        # I want to add some knowledge to beta
        self.bn = nn.BatchNorm1d(vocab_size, affine=False)
        self.drop = nn.Dropout(dropout)

    def forward(self, inputs):
        inputs = self.drop(inputs)
        # the output is σ(βθ)
        return F.softmax(self.bn(self.beta(inputs)), dim=1)
1 Like

I think you could add that prior as an extra observe statement in your model, something like this:

def model(...):
    # ... original VAE stuff...

    # then add a new loss term:
    prior_topics = torch.tensor([0, 0, 0, 0, 0, 1, 1, 1, 1, 1])
    prior_words = torch.tensor([
        vocab.index(word) for word in [
            "Investment", "Loan", "Mortgage", "Financial", "Services",
            "athletics", "arena", "beat", "award", "captain",
        ]
    ])
    with pyro.plate("prior_plate", len(prior_topics)):
        pyro.sample("prior", dist.Categorical(logits=decoder(prior_topics)),
                    obs=prior_words)
1 Like

Dear Fritz,
Thank you so much.
I will try it soon.

Hi Fritz,

I have read the original paper again and seen that β is unconstrained, they just define it as matrix of weight of each topic (topic-word probability). Thus, the tutorial define it as Linear NN.
Moreover, our model try to decode observed word distribution from θ and β.
You gave me to method to sample the prior topic (β), we also have Dirichlet prior θ.
But the problem here is how to incorporate these 2 distributions in Decoder Network.

I’m sorry because I have just start to studied and worked with Bayesian learning in a short time