How to extract constrained parameters from Autoguide to customize new objective

Hi,

I’m following pyro tutorial custom_objectives to customize my loss function. Specifically , my additional part of the loss will depends on the model fitted value. (Here “fitted value” means: Sum square error = torch.sum(torch.square(fitted_value - observations)) for example. And my self-defined loss is more complex than SSE ). To do that, I want to extract constrained parameters to define my new loss.

I’m using pyro’s autoguide and all of my parameters are constrained (etc. by uniform distributions with specific bounds). However, by using list(guide.parameters()), the returned parameters are unconstrained ones and thus cannot be directly used to calculate the model fitted values. Thus I’m wondering how to extract constrained parameters to define my new loss function, and then to be passed to the torch optimizer? Or alternatively, how to extract model fitted values directly in order to define my new loss function to be passed to the torch optimizer?

Thank you very much for your help!

Hi @fritzo, it would be highly appreciated if you could provide some suggestions. Thank you!

@TTH_Stat have you tried looking at older posts? e.g. this one

Thank you very much for your reply @martinjankowiak I’m sorry for my late response.

I’ve read the post you mentioned (and other posts about customizing the loss like cunstom loss ).

From the post you pointed to me, I understand that paramteres that are optimized by Adam are unconstrained, and then they are mapped to constrained space. Since I’m using autoguide, I cannot get constrained parameters using pyro.param(...) . But I found that pyro.get_param_store().values() can return both constarined and unconstrained parameters from autoguide. So I extracted the constained parameters from pyro.get_param_store().values() to calculate my loss function and throw unconstrained parameneters from list(guide.parameters() to Adam. However this result doesn’t make sense.

Below is my example of estimating slope and intercept from a simple linear regression. It would be highly appreciated if you could take a look.

import torch
import pyro
import numpy as np
from pyro.optim import Adam
from pyro.infer.autoguide import AutoNormal
import pyro.distributions as dist
from pyro import poutine
from pyro.infer import SVI, Trace_ELBO
from pyro.ops.special import safe_log
from matplotlib import pyplot as plt

np.random.seed(0)

xtest = torch.tensor([2.,4.,5.,6.,8.],dtype = torch.float32)
ytest = 2*xtest +8

def model(xtest,ytest):
    slope = pyro.sample('slope', dist.Uniform(0,10))
    intercept = pyro.sample('intercept', dist.Uniform(0,10))
    locs = xtest *slope + intercept
    with pyro.plate('data', len(xtest)):
        pyro.sample('obs', dist.Normal(locs,0.2), obs=ytest)


def my_loss(guide):
    
    constrained_params = list(pyro.get_param_store().values())
    slope = constrained_params[1]
    intercept =  constrained_params[3]

    preds = slope * xtest + intercept
    y = ytest
    SSE = torch.sum(torch.square(preds - y))
    return SSE

pyro.set_rng_seed(0)
pyro.clear_param_store()
guide = AutoNormal(model)
guide_init = guide(xtest,ytest)
optimizer = torch.optim.Adam(list(guide.parameters()), lr=0.005)

losses = []
n = 1000
for i in range(n):

    loss = my_loss(guide)
    loss.backward()
    optimizer.step()
    optimizer.zero_grad()
    losses.append(float(loss.detach().numpy()))  

plt.figure(figsize=(14, 7))
plt.plot(losses)

with pyro.plate('samples', 2000, dim=-1): 
     svi_samples = guide(xtest,ytest)
print('slope',torch.mean(svi_samples['slope']))
print('intercept',torch.mean(svi_samples['intercept']))

It seems that it’s converging, however, the estimation of the variables make no sense since they’re simply the avarage of the uniform bound (0+10)/2. The ground truth should be 2 and 8 respectively.

Thus I’m assuming the way I got constained parameters is wrong. I’m wondering how to extract constrained parameters from the autoguide to define my new loss function? Or alternatively, how to extract model fitted values f(x) directly in order to define my new loss function (SSE =sum (f(x)-yobs))^2?

Thank you a lot

Hi @TTH_Stat,
I think the easiest solution to get your code snippet working is to switch from an AutoNormal guide to an AutoDelta guide. Then you’ll be able to simply access the constrained point estimates as attributes of your guide (I think this should work, but LMK if there’s a spelling error):

guide = AutoDelta(model)

def my_loss(guide):
    slope = guide.slope
    intercept = guide.intercept
    ...

And to explain a bit about what’s going on: the AutoNormal guide does not own constrained parameters for slope and intercept, rather it thinks of slope and intercept as latent variables. It’s parameters aren’t in one-to-one correspondence with latent variables, rather for each latent variable the AutoNormal guide owns an unconstrained loc parameter and a positive-constrained scale parameter that parameterize a normal distribution over an unconstrained latent variable. To sample a constrained variable, the AutoNormal guide first samples an unconstrained normal random variable, then transforms that sample into a constrained sample via biject_to(constraint)

AutoNormal:
(loc,scale) params ---> unconstrained sample ---> constrained sample
              Normal.sample()       biject_to(constraint)

AutoDelta:
unconstrained param ---> constrained param = constrained sample
           transform_to(constraint)

By contrast the AutoDelta guide does have an unconstrained and constrained parameter (possibly the same parameter if the constraint is constraints.real) corresponding to each pyro.sample site in the model. It sounds like this is what you want and what your my_loss() function expects.

Thank you very much for your detailed explanations @fritzo.
I switched to AutoDelta and the optimization of my_loss is working after adjusting learning rate and iteration numbers properly!

The full implementation is here: Google Colab

However, I tried to draw a histogram for the sampled parameters. It seems that all the samples from the AutoDelta are the same, thus the estimation of a parameter is deterministic instead of a distribution. Is this a characteristic of AutoDelta guide?

If I do hope that my estimated parameters are distributed with mean and width, does that mean I should still stick to the AutoNormal guide? If so, is there any other way to achieve this? For instance, in the AutoNormal guide case, can I directly obtain the constrained sample of (y_hat = slope_hat * x_test + intercept_hat) from the trace to calculate the loss function? Or, can I get the unconstained parameters and the corresponding transformation from the biject_to in order to get the constained parameters for calculating the loss? etc…

Or should I define my own guide?

(For me, my actual model is much more complex than this linear regression, and I expect my customize loss of SVI to be a combination of ELBO and other parts, rather than the SSE here as an example.)

Thank you very much for your help

If you want posterior uncertainty then you’ll need to use a loss with an entropy-like term, e.g. ELBO loss. Your custom loss can’t learn uncertainty parameters. AutoDelta assumes there is zero posterior uncertainty, and thus is appropriate for your loss.

If you can share a more complex loss function we can help suggest how to get it working with a guide.

But I don’t so far see any need for a custom loss. If you want an SSE loss in PyTorch (preds-y).square().sum(), you can encode that in Pyro as a likelihood pyro.sample('obs'), dist.Normal(locs,0.2), obs=ytest). What is your motivation for replacing the quadratic Normal likelihood with an equivalent custom loss that neglects uncertainty?

Thank you very much for helping me so patiently, @fritzo !

I’m facing an optimization problem where the solution that gives me the lowest SSE (sum squared error) and the solution that gives me the highest Pearson R correlation between the predicted and actual values of a sequence are not the same.(In fact the two solutions differ significantly). Thus I want a solution that can balance between SSE(y,yhat) and R-correlation(y,yhat). In numerical optimization, I can achieve this by minimizing

loss= SSE + (1-R)* constant

where the constant is a self-defined weight. Similarly, in variational inference, I’m wondering if I can define an objective function by adding the Pearson correlation part to the original ELBO loss part, i.e:

new objective function = ELBO_loss + (1-R) * constant

while quantifying the uncertainty.

That’s to say, what I really want to do is not replacing ELBO loss with SSE, but adding the Pearson correlation part to the original ELBO, so that I can balcance bewteen SSE and R correlation.

Is that possible to do so (using AutoNormal or self-defined guide)?

Thank you so much !