Issue with optimizing (planar) flow parameters using torch.optim

I’m trying to optimize flow parameters using torch.optim, and I get the following error:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [2]] is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

A minimal reproducing example:

import torch
from torch import tensor as tt
from pyro.distributions.transforms import Planar

x = tt([[-1], [2]]).float()
u_t = tt([[3], [4]]).float()
w_t = tt([[-0.2], [1.4]]).float()
y = x + u_t * torch.tanh(w_t.T@x - 1)

flow = Planar(2)

optimizer = torch.optim.Adam(flow.parameters(), lr=0.1)
x_ = x.reshape(-1); y_ = y.reshape(-1)

for i in range(10):
    optimizer.zero_grad()    
    y_recon = flow(x_)
    loss = torch.sum((y_ - y_recon)**2)
    loss.backward(retain_graph=True)
    optimizer.step()

I believe that this is being caused by something that’s inheritance-related, as if I merge the ConditionedPlanar class and the Planar class (while just inheriting from TransformModule), I do not get this error. I also do not get this error if I remove ConditionedPlanar's inheritance from torch.distributions.Transform and make it inherit from object.

Is this a bug? Is there some way for me to make the error go away? Or is this desired behaviour?

have you seen the use of clear_cache() in the normalizing flows tutorial?

since these flows were initially designed to be used within pyro, there are some issues when you use them with raw pytorch

Thanks for the link, I missed this. clear_cache seems to a method of TransformedDistribution though, rather than a method of the flow. I’ll try to see if I can reset the flow’s cache somehow. Have you got any thoughts on how to do this?

Also, with flows, some of the log_prob methods can’t be computed (you run into KeyErrors when the flow looks for the inverse in its cache)… which is a bit frustrating

So, if I set the cache_size of the parent class (ConditionedPlanar) to be 0, I no longer get this error. Would it be possible for the user to set the cache size in future versions of pyro? (I’m happy to open a PR but not sure what the design implications are).

Edit: actually it was as easy as adding a line flow._cache_size=0 before the code above, and it works fine.

Hi @aditya, PyTorch 1.6 should not have a public method .with_cache() that I hope should work for you. I beleve it creates a shallow copy rather than setting the ._cache_size attribute in-place.

Hi @aditya, if I can add to the discussion, you can add flow.clear_cache() to the end of the training loop and it will all work!

.clear_cache() is defined on both Transform and TransformedDistribution: https://github.com/pyro-ppl/pyro/search?q=clear_cache&unscoped_q=clear_cache

Many flows do not have an analytic or easily calculable inverse (for instance Planar and Radial) so you can’t do log_prob on an arbitrary sample for these types of transforms… If you do flow2 = flow.inv, which swaps the forward and inverse ops, then you can score arbitrary samples for these transforms but not do sampling.

Hope this helps!

You can of course call log_prob on samples that were generated by the transform - this is the purpose of the cache

Ah nice. I think I’ve got an older version of pyro installed. Thanks!