Dirichlet distribution sometimes outputs tensor full of NaNs

Hey guys,
Found an issue with Dirichlet distribution. Sometimes, for no apparent reason, it outputs a tensor full of NaNs. The line causing the problem is as simple as:

doc_topics = pyro.sample("doc_topics", dist.Dirichlet(topic_weights))

topic_weights is asserted to be a valid tensor, always.
Now, I’m circumventing the problem by replacing the line above by the following code:

repeat = True
i = 0
while repeat:
    doc_topics = pyro.sample("doc_topics_%d" % i, dist.Dirichlet(topic_weights))
    repeat = doc_topics.isnan().any()
    i += 1

It works… but it is as ugly as code can possibly be. (In Brazil we have a slang for that: “gambiarra” :slight_smile:)

Any hints on how to solve that?

I observed someone already had the same issue in the past, and one of the core devs suggested him to post the question in PyTorch forum (which he didn’t). Anyway, I will post the same question in PyTorch forum…

Hi @carlossouza, what is the value of topic_weights when it produces NaN?

No particular value… Here’s some topic_weights examples that triggered the exception once (without the ugly workaround):

tensor([0.0559, 0.2049, 0.5021, 0.5713, 3.7809, 0.3067, 3.3700, 0.0069, 0.6454,
        1.4444, 0.0861, 0.4709, 0.4869, 1.0992, 0.4990, 0.4082, 0.7089, 1.2442,
        0.1177, 1.1107], grad_fn=<DivBackward0>)
tensor([0.6328, 0.4231, 1.2324, 0.6207, 0.2522, 1.0411, 4.0930, 3.9515, 0.1285,
        0.0608, 1.8635, 0.6003, 0.6442, 2.0052, 0.0524, 0.5430, 1.1773, 0.7053,
        2.6416, 0.5089], grad_fn=<DivBackward0>)
tensor([0.3156, 0.1399, 0.6539, 0.2121, 0.7239, 3.4034, 0.0269, 0.6101, 0.8757,
        0.8577, 0.0743, 0.2925, 0.1734, 1.9466, 0.7716, 1.9763, 0.2505, 0.3990,
        0.9996, 0.4179], grad_fn=<DivBackward0>)

Again, once I repeat the sample, the problem goes away…
I actually tried sampling these weights specifically in a long loop, no exceptions were raised… But it always raise an exception if I follow the lda.py code…

Interesting! I tried to sample from Dirichlet multiple times but could get nan. I don’t know how to debug this…

To reproduce, just run this code:

git@github.com:ucals/prodlda_debug.git

Somewhere in the middle of the first epoch it will break with the following error:

RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

Thanks!

Hi @carlossouza,

I was intrigued by your comment and I had a look at the code. It seems that the NaNs are produced by the line doc_topics = self.inference_net(doc_sum) in the guide (line 82). This in turn comes from the fact that the parameters of the Encoder are NaNs, so this looks like a problem during training of ProdLDA.

If you set the random seed to 123 with pyro.set_rng_seed(123) at the beginning of your code, you will see that everything is fine until iteration 98 (i.e. when the value of i in for i in bar is up to 97) and that at the next iteration the parameters of Encoder are NaNs so the guide produces a doc_topics sample that is all NaNs, then the model inherits the NaNs from the guide, but they were never produced by the Dirichlet distribution.

I ran the Dirichlet distribution from pyro and pytorch a couple thousand times and never observed any NaN. That’s a very small number to check a pseudo-random generator, but the production of NaNs in your code is more like once per 100 iterations so I think you can rule out the PRG of the Dirichlet function as the source of the problem.

I hope this helps. Your tutorials are otherwise amazing and it’s a pleasure to read them. Keep going!

Thanks @gui11aume ! You are right, there’s nothing wrong with the distribution per se…

I managed to finish this LDA tutorial (it’s being reviewed as we speak), check it out:

Probabilistic Topic Modeling is not my favorite topic, but I think Pyro should have a nice tutorial about it: having great tutorials in several different areas supports the universal in Deep Universal Probabilistic Programming.

I’ve read some of your articles, they are great! Especially the last one… your creativity vs persistence discussion reminded me of the Tesla vs Edison popular question. I personally think curiosity is the most important trait of a scientist, because it fuels both creativity and persistence. But I’m not a scientist, I’m just an engineer… you are probably right: being (in)tolerant to cognitive dissonance might be more important :slight_smile:

And thanks for the compliment! Btw, if you want to collaborate in writing the next tutorial, pls let me know!
Cheers!

Thanks @carlossouza. Much appreciated! What other tutorials do you have in mind?

My wishlist:

  1. Probabilistic Matrix Factorization: Collaborative variational autoencoder for recommender systems or something else in this area
  2. Zero-shot learning: One-Shot Generalization in Deep Generative Models or something else in this area
  3. Disentangled representation: Deep Convolutional Inverse Graphics Network or something else in this area
  4. Trajectory prediction: Real Time Trajectory Prediction Using Deep Conditional Generative Models (actually I want to try this approach in the new huge dataset Lyft has just released)

… but I’m opened to suggestions as well!

1 Like

Wow, they are all amazing! I only have sketches and drafts for using Pyro in biology… nothing as elaborate as you :smile:. We can go for whichever you feel is the most urgent. My personal preference goes to the first because it may have some direct applications in my work — for instance it may help predict the next pathogen that will spillover to humans, something of great interest for the community. If you are game, we can discuss the details in private (my email address is easy to find on my blog).

1 Like

@gui11aume I for one would love to see more examples of Pyro used in biology! @fritzo, @martinjankowiak and I on the core Pyro development team now all work at the Broad Institute of MIT and Harvard, so applying Pyro and probabilistic machine learning to biomedical research is our day job and will be a focus of Pyro development going forward.

@eb8680_2 This is awesome! Deep learning has revolutionized IT, marketing, finance, transport etc. but it had almost no impact on natural sciences so far. That’s because deep learning is mostly about supervised and semi-supervised methods, but they are not so useful for biologists — we don’t always know what we are looking for.

I think that Pyro is our best shot to study complex systems like cells or ecosystems. My hope is that we can use deep-learning-based models to capture their complex behavior, then use Pyro to make predictions or infer something about their states. First, that would be useful. Second, we may discover some hidden simplicity in these complex systems.

I have a couple of ideas about how to do that, but nothing spectacular or even concrete yet. In the meantime let’s see if @carlossouza and I can get something interesting from co-infection networks to see which pathogens are already infecting humans without us knowing.

We can continue the broader discussion somewhere else or just meet in some other academic context — I am a newly appointed professor at the University of Toronto Scarborough. That would be interesting.

1 Like

@eb8680_2, @martinjankowiak, @fritzo,
If you guys know/could share with us interesting papers of probabilistic machine learning applications to biology that would be great!
Cheers
Carlos

If you guys know/could share with us interesting papers of probabilistic machine learning applications to biology that would be great!

Sure - one of the most active areas of research in biology today is the development of increasingly sophisticated single-cell multi-omics methods, in which multiple types of high-throughput measurements, like whole-transcriptome sequencing, are made across thousands of individual cells in parallel in a single experiment. As you can imagine, these experiments produce a tremendous amount of noisy, high-dimensional data that biologists have to sift through, as discussed in this recent review of single-cell data analysis.

For a specific example, see this paper, which is representative of lots of similar work, and the references therein. It describes the use of VAEs to learn representations and cluster cells into cell types from single-cell RNA sequencing data as implemented in their package scVI. A simple Pyro implementation of one such model by @martinjankowiak is here.

If you’re more interested in healthcare and medicine, this new review of probabilistic ML applications in healthcare looks like a good starting point.

1 Like