Loss is infinite

Happy New Year!


I am trying to implement figure 4.1 from the book “Model-based machine learning” by Winn and Bishop. I am doing this using a Python Notebook, attached in this message. The notebook cannot be attached but can be found at the following link on Google Drive.


I am trying to implement the figure drawn on the top using Pyro. The final class is at the bottom of the file, and is called SVI_template1. The driver for the class is displayed below:

data = 4.2
nb_mails = 6
# features should be |normally distributed
m = Normal(.2, .8)
data = m.sample([6])
nb_steps = len(data)
tst = SVI_template1(data, nb_mails, nb_steps)

Regardless of the size of the dataset, I do not expect an infinite loss. I even created routines to display the traces to check whether the log_probs were infinite, and from what I can see, they are not. Finally, I noticed that the parameters sig_si and mu_si remain constant throughout, and I do not understand why. Any help is appreciated. If you need anything else from me, please let me know. Thank you!

I found my error. In case anybody else encounters this, the error stems from the model:

   def model(self, data, nb_emails):
        weight = pyro.sample("weight", dist.Normal(tensor([0.]), tensor([1.])))

        with pyro.plate("emails", nb_emails):
            feature = torch.tensor([0.3])
            featureValue = pyro.sample("feature", dist.Normal(.4, 1), obs=data)
            score = weight * featureValue
            #score = pyro.sample("score", dist.Normal(0., weight), obs=score)
            score = pyro.sample("score", dist.Delta(score), obs=score)

The score is a deterministic function (multiplication) of weight and featureValue. I sampled score with a Normal distribution, which was inconsistent with the deterministic function. Once I sample a Delta distribution, everything worked in the sense that losses were no longer infinite. I also added an obs argument since the score derives from a deterministic formula.

I do have a question: is adding the “obs” argument the correct approach? Could somebody give me an example of using a Delta distribution that does not involve an observation? Thanks.

You don’t need to wrap deterministic computations with Delta distributions. The model in question ends up being very similar to logistic regression, which looks like the Bayesian regression tutorial model, but with a Bernoulli rather than Normal likelihood. Your version should look something like this:

   def model(self, featureValue, nb_emails, label=None):
        weight = pyro.sample("weight", dist.Normal(tensor([0.]), tensor([1.])))
        threshold = pyro.sample("threshold", dist.Normal(0., 10.))
        with pyro.plate("emails", nb_emails):
            score = weight * featureValue + threshold
            return pyro.sample("repliedTo", dist.Bernoulli(logits=score), obs=label)