Frustrated with memory error

jordan.howell2 · November 2, 2020, 1:03pm

Hello,

I’m having a hard time running a poisson regression with just one predictor variable. I keep getting a memory error.

Traceback (most recent call last):

  File "<ipython-input-15-a0536389f5f9>", line 4, in <module>
    elbo = svi.step(x, y)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\pyro\infer\svi.py", line 128, in step
    loss = self.loss_and_grads(self.model, self.guide, *args, **kwargs)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\pyro\infer\trace_elbo.py", line 126, in loss_and_grads
    for model_trace, guide_trace in self._get_traces(model, guide, args, kwargs):

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\pyro\infer\elbo.py", line 170, in _get_traces
    yield self._get_trace(model, guide, args, kwargs)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\pyro\infer\trace_elbo.py", line 53, in _get_trace
    "flat", self.max_plate_nesting, model, guide, args, kwargs)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\pyro\infer\enum.py", line 55, in get_importance_trace
    model_trace.compute_log_prob()

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\pyro\poutine\trace_struct.py", line 216, in compute_log_prob
    log_p = site["fn"].log_prob(site["value"], *site["args"], **site["kwargs"])

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\distributions\independent.py", line 88, in log_prob
    log_prob = self.base_dist.log_prob(value)

  File "C:\Users\JORDAN.HOWELL.GITDIR\AppData\Local\Continuum\anaconda3\envs\torch_env\lib\site-packages\torch\distributions\poisson.py", line 63, in log_prob
    return (rate.log() * value) - rate - (value + 1).lgamma()

RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 950208322568 bytes. Buy new RAM!

The size of the data is 344,000 rows of 1’ and 0’ for the predictor variable and a long tensor for the y variable.

I basically copied/pasted the code from the regression template example.

def model(x_data, y_data):
    a = pyro.sample("mean", dist.Normal(0., 10.))
    b = pyro.sample("edit_coeff", dist.Normal(0., 1.))
    sigma = pyro.sample("sigma", dist.Uniform(0., 10.))
    mean = a + b * x_data
    rate = mean.exp()
    with pyro.plate("data", len(x)):
        pyro.sample("obs", dist.Poisson(rate = rate).independent(1), obs=y_data)  
        
def guide(x_data, y_data):
    a_loc = pyro.param('a_mean', torch.tensor(0.))
    a_scale = pyro.param('a_std', torch.tensor(1.),
                         constraint=constraints.positive)
    sigma_loc = pyro.param('sigma_loc', torch.tensor(1.),
                             constraint=constraints.positive)
    weights_loc = pyro.param('weights_loc', torch.randn(3))
    weights_scale = pyro.param('weights_scale', torch.ones(3),
                               constraint=constraints.positive)
    a = pyro.sample("a", dist.Normal(a_loc, a_scale))
    b = pyro.sample("edit_coeff", dist.Normal(weights_loc[0], weights_scale[0]))
    sigma = pyro.sample("sigma", dist.Normal(sigma_loc, torch.tensor(0.05)))
    mean =  a + b * x_data     
    
def summary(samples):
    site_stats = {}
    for site_name, values in samples.items():
        marginal_site = pd.DataFrame(values)
        describe = marginal_site.describe(percentiles=[.05, 0.25, 0.5, 0.75, 0.95]).transpose()
        site_stats[site_name] = describe[["mean", "std", "5%", "25%", "50%", "75%", "95%"]]
    return site_stats

# Utility function to print latent sites' quantile information.
def summary(samples):
    site_stats = {}
    for site_name, values in samples.items():
        marginal_site = pd.DataFrame(values)
        describe = marginal_site.describe(percentiles=[.05, 0.25, 0.5, 0.75, 0.95]).transpose()
        site_stats[site_name] = describe[["mean", "std", "5%", "25%", "50%", "75%", "95%"]]
    return site_stats

svi = SVI(model,
          guide,
          optim.Adam({"lr": .05}),
          loss=Trace_ELBO())


pyro.clear_param_store()
num_iters = 1000 
for i in range(num_iters):
    elbo = svi.step(x, y)
    if i % 500 == 0:
        logging.info("Elbo loss: {}".format(elbo))

Should I be doing something else to save on memory? Even my 32GB gpu runs out of memory where I’ve ran pymc3 models bigger than this one fine.

fehiepsi · November 2, 2020, 2:58pm

I think you need to remove this.

jordan.howell2 · November 2, 2020, 8:09pm

It didn’t run but it at least it moved to another error. Why did that fix it?

fehiepsi · November 2, 2020, 8:35pm

I think more information about the shapes of x_data, y_data and len(x), is needed. But I just guess that x_data.shape == y_data.shape == (len(x),). If so,

dist.Poisson(rate = rate)

will have batch_shape = x_data.shape = (len(x),), event_shape = ().

dist.Poisson(rate = rate).independent(1)

will have batch_shape = (), event_shape = (len(x),).

Under plate("data"),

pyro.sample("obs", dist.Poisson(rate = rate).independent(1))

“obs” site will have batch_shape = (len(x),), event_shape = (len(x),). So this site will have shape (len(x), len(x)). Is this what you intended?

Without independent(1), obs site will have shape (len(x),), which is the shape of y (if I understand the shapes of your data correctly).

it moved to another error

Your model has Uniform(0, 10) prior for sigma but you have Normal guide for it. I think it will lead to errors when the guide provides some samples outside of the interval (0, 10). It is better to use guides having the same support as priors.

jordan.howell2 · November 6, 2020, 9:02pm

Thank you. I’m having trouble moving to pyro from pymc3 which no longer works on the computer my job provides. The shape of my x_data is torch.Size([344639, 1]) and the shape of my y is torch.Size([344639, 1]). I’m just starting with one X variable until I learn this more.

Here is the code I’ve changed. I think a NB model will better suite the data.

def model(x_data, y_data):
    a = pyro.sample("intercept", dist.Normal(0., 1.))
    b = pyro.sample("edit_coeff", dist.Normal(0., 1.))
    sigma = pyro.sample("sigma", dist.Uniform(0., 5.))
    mean = a + b * x_data
    rate = mean.exp()
    with pyro.plate("data", len(x)):
        pyro.sample("obs", dist.NegativeBinomial(total_count = 5, probs = .029), obs=y_data)  
        
def guide(x_data, y_data):
    a = pyro.sample("intercept", dist.Normal(0, 1))
    b = pyro.sample("edit_coeff", dist.Normal(0, 1))
    sigma = pyro.sample("sigma", dist.Normal(0, 5))
    mean =  a + b * x_data     
    
def summary(samples):
    site_stats = {}
    for site_name, values in samples.items():
        marginal_site = pd.DataFrame(values)
        describe = marginal_site.describe(percentiles=[.05, 0.25, 0.5, 0.75, 0.95]).transpose()
        site_stats[site_name] = describe[["mean", "std", "5%", "25%", "50%", "75%", "95%"]]
    return site_stats

svi = SVI(model,
          guide,
          optim.Adam({"lr": .05}),
          loss=Trace_ELBO())


pyro.clear_param_store()
num_iters = 1000 
for i in range(num_iters):
    elbo = svi.step(x, y)
    if i % 500 == 0:
        logging.info("Elbo loss: {}".format(elbo))

And this is where I get:

RuntimeError: [enforce fail at ..\c10\core\CPUAllocator.cpp:72] data. DefaultCPUAllocator: not enough memory: you tried to allocate 475104161284 bytes. Buy new RAM!

Is there a better way to write out the model? I just want to start with a linear regression and eventually move to hierarchical model.

fehiepsi · November 6, 2020, 10:02pm

I think you need to specify dim=-2 here, otherwise, it will use the default value dim=-1 for your data plate. Btw, you don’t need to have mean = a + b * x_data in the guide and you need to define some variational parameters so that SVI can optimize them. Otherwise, SVI will do nothing.

jordan.howell2 · November 7, 2020, 6:35pm

The dim =-2 and it’s running! I’m so happy. Thank you. Why does that work? I’m not sure I understand what’s going on?

I can’t believe I might finally get to use a bayesian model in my job. I’m pumped.

I copied from the tutorial, Bayesian Regression - Inference Algorithms (Part 2) — Pyro Tutorials 1.8.4 documentation which had the mean in the guide. Is that wrong?

What would I need for the SVI outside of :
svi = SVI(model, guide, optim.Adam({"lr": .05}), loss=Trace_ELBO())

Did I mention I was happy! Thanks for sticking with me on this!

fehiepsi · November 8, 2020, 2:59am

About the mean, it is not wrong. That computation is just not needed for anything else. Do you want to submit a PR in github to remove that line in the tutorial?

Your observation has two batch dimensions: -1 and -2 where -1 dim has size 1 and -2 dim has size 344639. So the data plate should be -2.

If you want batch dimension -1 to be data dim, then you can use dist.NegativeBinomial(...).to_event(1) to move the last dimension to event dimensions. More details can be found in tensor shapes tutorial.

jordan.howell2 · November 12, 2020, 2:09pm

submitted