Implementing custom SVI objectives

Hey all,
I’m trying to add mean squared error loss to my objective for a variational autoencoder, so I’m following the custom objectives tutorial, and I have code that is a similar and otherwise functioning version of the vae tutorial. Specifically I’m trying to implement the section, a lower level pattern:

# define optimizer and loss function
optimizer = torch.optim.Adam(my_parameters, {"lr": 0.001, "betas": (0.90, 0.999)})
loss_fn = pyro.infer.Trace_ELBO.differentiable_loss
# compute loss
loss = loss_fn(model, guide)
loss.backward()
# take a step and zero the parameter gradients
optimizer.step()
optimizer.zero_grad()

I’d like some more detail on how to actually implement this in practice. This is what I have tried:

optimizer = torch.optim.Adam(vae.parameters(), lr=1e-3)
elbo_loss_fn = pyro.infer.Trace_ELBO.differentiable_loss
for epoch in range(1000):
    epoch_loss = 0.
    for x, _ in train_dl:
        x = x.cuda()
        loss = elbo_loss_fn(model=vae.model, guide=vae.guide)
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

Which gives me the error “TypeError: differentiable_loss() missing 1 required positional argument: ‘self’”. And never looks at the data x. I figure if I can get the elbo loss to work, I can just take the MSE between x and its reconstruction and add it to the elbo loss before finding the gradients with loss.backward().

Hi @wthrift, it seems like a typo in tutorial. Could you please try with

loss_fn = pyro.infer.Trace_ELBO().differentiable_loss

Thanks for taking the time to reply to my post, I really appreciate it.
Adding parenthesis fixes that error, but the code still doesn’t run as in the tutorial. It returns the following error:
TypeError: guide() missing 1 required positional argument: ‘x’
I’m not really sure what x refers to because if I put data from my dataloader into it it returns:
TypeError: ‘Tensor’ object is not callable
Same for model.
Thank you for your time.

I think that your model/guide requires an argument x. In that case, you need to call

loss_fn(model, guide, x)

Sorry that the tutorial missed these important points. It might be better to look at the documentation first. If the above fix works well for you, I’ll update the tutorial to address these issues. Thanks!

Thank you that worked!
Now I have added MSE to the elbo loss function as follows:

    for x, _ in train_dl:
        x = x.cuda()
        elbo_loss = elbo_loss_fn(vae.model, vae.guide, x)
        mse_loss = F.mse_loss(x, vae.reconstruct_img(x))
        loss = elbo_loss + mse_loss
        loss.backward()
        optimizer.step()
        optimizer.zero_grad()

This new loss seems well behaved but unfortunately this yields a cuda memory error after about 30 epochs:
RuntimeError: CUDA out of memory.

I’ve tried adding

torch.cuda.empty_cache()
gc.collect()

Every epoch but it doesn’t help the problem.

@wthrift I can’t find anything wrong with your training process. Maybe this comes from data loader. What happens if you remove mse_loss or elbo_loss?

Thanks for your continued help.
Using just mse_loss or just elbo_loss results in the same error.

Based on what you said I thought of some things to test out and I’ve found the source of the problem! I have a loss+= loss line in my model testing code that I run every epoch. The code is as follows, it comes just after my training code.

    # initialize loss accumulator
    test_loss = 0.
    # compute the loss over the entire test set
    for i, (x, _) in enumerate(test_dl):
        x = x.cuda()
        elbo_loss = elbo_loss_fn(vae.model, vae.guide, x)
        mse_loss = F.mse_loss(x, vae.reconstruct_img(x))
        temp_loss = elbo_loss + mse_loss
        test_loss += temp_loss

    # report test diagnostics
    normalizer_test = len(test_dl.dataset)
    total_epoch_loss_test = test_loss / normalizer_test
    test_elbo.append(total_epoch_loss_test)
    torch.cuda.empty_cache()
    gc.collect()

Removing the + in test_loss += temp_loss solves the problem.

If it matters I define my data loader as follows (batch_size=32):

X_train = torch.from_numpy(X_train).float()
X_test = torch.from_numpy(X_test).float()
y_train = torch.from_numpy(y_train).float()
y_test = torch.from_numpy(y_test).float()
train_ds = TensorDataset(X_train, y_train)
train_dl = DataLoader(train_ds, batch_size=batch_size, shuffle=True)
test_ds = TensorDataset(X_test, y_test)
test_dl = DataLoader(test_ds, batch_size=batch_size, shuffle=True)

You’re welcome! I believe you can resolve this issue by adding the context with torch.no_grad(): for your testing code. Another way is to call test_loss += temp_lost.detach(). For testing, you don’t want pytorch remember the computation graph of all iterations (which is caused by test_loss += temp_loss).

Ah that makes sense, thanks for all of your help.

Hi there!
I am struggling to implement the custom objective tutorial (some random error with retain_graph), but in fact I don’t need a custom SVI: all I want is to store the gradient after each SVI step and implement a stopping criterion that way. Is there maybe an easier way to proceed?
Thanks
Guillaume

can’t really help you without further details. probably you want to use something like pyro.infer.Trace_ELBO.differentiable_loss as is done in the tutorial

1 Like
def step(self, *args, **kwargs):
    """
    :returns: estimate of the loss
    :rtype: float

    Take a gradient step on the loss function (and any auxiliary loss functions
    generated under the hood by `loss_and_grads`).
    Any args or kwargs are passed to the model and guide
    """
    # get loss and compute gradients
    with poutine.trace(param_only=True) as param_capture:
        loss = self.loss_and_grads(self.model, self.guide, *args, **kwargs)

    params = set(site["value"].unconstrained()
                 for site in param_capture.trace.nodes.values())

    # actually perform gradient steps
    # torch.optim objects gets instantiated for any params that haven't been seen yet
    self.optim(params)

    # zero gradients
    pyro.infer.util.zero_grads(params)

    return torch_item(loss)

The above is the step function for svi. You can see how it gets the params. If you want to get the gradient, you can add some actions behind self.optim(params) before zero_grads.

Hope it works for you.

1 Like

It works indeed, thanks!