Memory Hungry Example of pyro?

MarisaKirisame · September 23, 2020, 9:00am

Hi, I am a ML System researcher, focusing on saving memory for deep learning. We had invented a technique called Dynamic Tensor Rematerialization which save memory for aribitary pytorch program, and we think deep probabilistic programming will be a great benchmark. However, we are very unfamiliar with deep probabilistic programming, so… What is some good example that takes up lots of memory, and is pretty hot? (Basically, the resnet of deep ppl)?

martinjankowiak · September 24, 2020, 5:04pm

hi, can you give some more intuition for the kind of compute graph you’re interested in? wildly overgeneralizing, the computations used in probabilistic inference that are more memory intensive tend to be monolithic, e.g. computing the cholesky decomposition of a large matrix, as opposed to a sequential string of tensor ops as you find in e.g. a resnet. of course, neural networks appear in probabilistic models but they tend to be more moderately sized (i.e. not 50 layers).

MarisaKirisame · September 28, 2020, 7:49am

@martinjankowiak
Thanks! Our approach only work when there is lots of tensor ops, each eating a chunk of the memory.
The cholesky decomposition example probably wont work, so shoul I just look for probabilistic neural network model?
Also: the cases I am imagining is backpropagating through a monte-carlo sampler. If the chain is long, it will take lots of memory. Does that happend in pyro?

martinjankowiak · September 28, 2020, 3:35pm

@MarisaKirisame your best bet would probably be to try the DMM tutorial. it’s sort of slow because the guide method unrolls an autoregressive loop, but that’ll give you a long sequence of tensor ops. if you make the batch size, number of hidden units, etc, sufficiently large it’ll get pretty memory hungry pretty fast

MarisaKirisame · September 28, 2020, 7:15pm

this look like exactly what we need! thank you!