Hi everyone. Great project; very excited to see HMC reaching scalability via techniques like HMCECS.
I’ve been playing around with a single layer neural network with a data set of shape ~ (500k, 300). From my naive understanding, sub-sampling SVI and HMC via HMCECS on a GPU should allow a data set like this to scale. However I see very large memory requirements even for a small hidden layer of h1 = 5. Additionally it takes a very long time for (say SVI) to begin.
I’d like to understand from others where the known bottlenecks are located. Is it XLA compilation or elsewhere? I have not started profiling but thought it best to get information from those who know this project and the related literature better than I before I begin.