ValueError: can't optimize a non-leaf Tensor

I am following an example like below which is working in early 2020 year :

from torch.optim import Adam

pyro.clear_param_store()
def model():
    mu = param('mu', tensor(0.))
    sig = param('sig', tensor(1.0))
    norm = dist.Normal(mu, sig)
    return sample('x', norm)

model() # Instantiate the mu parameter
cond_model = condition(model, {"x": tensor(5) })

# Large learning rate for demonstration purposes
optimizer = Adam([param("mu"), param('sig')], lr=0.01)
mus = []
sigs = []
losses = []
for i in range(2000):
    tr = trace(cond_model).get_trace()

    # Optimizer wants to push positive values towards zero,
    # so use negative log probability
    prob = -tr.log_prob_sum()
    prob.backward()

    # Update parameters according to optimization strategy
    optimizer.step()

    # Zero all parameter gradients so they don't accumulate
    optimizer.zero_grad()

    # Record probability (or "loss") along with current mu
    losses.append(prob.item())
    mus.append(param("mu").item())
    sigs.append(param("sig").item())

pd.DataFrame({"mu": mus, "loss": losses}).plot(subplots=True)
pd.DataFrame({"sig": sigs, "loss": losses}).plot(subplots=True)

But got

ValueError: Expected parameter scale (Tensor of shape ()) of distribution Normal(loc: 5.000012397766113, scale: -0.0036314278841018677) to satisfy the constraint GreaterThan(lower_bound=0.0), but found invalid values:
-0.0036314278841018677
Trace Shapes:
 Param Sites:
           mu
          sig
Sample Sites:

After some searching, I found sig should be positive and I need to add a constrain, change the code to

from torch.optim import Adam

pyro.clear_param_store()
def model():
    mu = param('mu', tensor(0.))
    sig = param('sig', tensor(1.0), constraint=constraints.greater_than(0))
    norm = dist.Normal(mu, sig)
    return sample('x', norm)

model() # Instantiate the mu parameter
cond_model = condition(model, {"x": tensor(5)})

# Large learning rate for demonstration purposes
optimizer = Adam([param("mu"), param('sig')], lr=0.01)
mus = []
sigs = []
losses = []
for i in range(2000):
    tr = trace(cond_model).get_trace()

    # Optimizer wants to push positive values towards zero,
    # so use negative log probability
    prob = -tr.log_prob_sum()
    prob.backward()

    # Update parameters according to optimization strategy
    optimizer.step()

    # Zero all parameter gradients so they don't accumulate
    optimizer.zero_grad()

    # Record probability (or "loss") along with current mu
    losses.append(prob.item())
    mus.append(param("mu").item())
    sigs.append(param("sig").item())

pd.DataFrame({"mu": mus, "loss": losses}).plot(subplots=True)
pd.DataFrame({"sig": sigs, "loss": losses}).plot(subplots=True)

Got error on opatimizer,

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-37-4dedd1c4f09f> in <module>
     12 
     13 # Large learning rate for demonstration purposes
---> 14 optimizer = Adam([param("mu"), param('sig')], lr=0.01)
     15 mus = []
     16 sigs = []

~\anaconda3\lib\site-packages\torch\optim\adam.py in __init__(self, params, lr, betas, eps, weight_decay, amsgrad)
     72         defaults = dict(lr=lr, betas=betas, eps=eps,
     73                         weight_decay=weight_decay, amsgrad=amsgrad)
---> 74         super(Adam, self).__init__(params, defaults)
     75 
     76     def __setstate__(self, state):

~\anaconda3\lib\site-packages\torch\optim\optimizer.py in __init__(self, params, defaults)
     52 
     53         for param_group in param_groups:
---> 54             self.add_param_group(param_group)
     55 
     56     def __getstate__(self):

~\anaconda3\lib\site-packages\torch\optim\optimizer.py in add_param_group(self, param_group)
    256                                 "but one of the params is " + torch.typename(param))
    257             if not param.is_leaf:
--> 258                 raise ValueError("can't optimize a non-leaf Tensor")
    259 
    260         for name, default in self.defaults.items():

ValueError: can't optimize a non-leaf Tensor

What’s wrong ?

this is a pretty hacky pyro usage that mixes torch.optim and pyro.param in an unintended way. it seems you’re trying to do maximum likelihood learning. i recommend you use pyro.optim and follow this example

code from here Probabilistic Programming with Variational Inference: Under the Hood | Will Crichton

His way is more understandable for me, because I don’t know much background of PyTorch.
Pyro’s tutorial is on a too high level I think, hardly know what it does in darkness. Would be good to have a tutorial from very low level.