Hi, I am a ML System researcher, focusing on saving memory for deep learning. We had invented a technique called Dynamic Tensor Rematerialization which save memory for aribitary pytorch program, and we think deep probabilistic programming will be a great benchmark. However, we are very unfamiliar with deep probabilistic programming, so… What is some good example that takes up lots of memory, and is pretty hot? (Basically, the resnet of deep ppl)?
hi, can you give some more intuition for the kind of compute graph you’re interested in? wildly overgeneralizing, the computations used in probabilistic inference that are more memory intensive tend to be monolithic, e.g. computing the cholesky decomposition of a large matrix, as opposed to a sequential string of tensor ops as you find in e.g. a resnet. of course, neural networks appear in probabilistic models but they tend to be more moderately sized (i.e. not 50 layers).
Thanks! Our approach only work when there is lots of tensor ops, each eating a chunk of the memory.
The cholesky decomposition example probably wont work, so shoul I just look for probabilistic neural network model?
Also: the cases I am imagining is backpropagating through a monte-carlo sampler. If the chain is long, it will take lots of memory. Does that happend in pyro?
@MarisaKirisame your best bet would probably be to try the DMM tutorial. it’s sort of slow because the
guide method unrolls an autoregressive loop, but that’ll give you a long sequence of tensor ops. if you make the batch size, number of hidden units, etc, sufficiently large it’ll get pretty memory hungry pretty fast
this look like exactly what we need! thank you!