SVI for classification

eamag · July 19, 2018, 8:30am

Hello. everyone!
I’m new in pyro and bayesian programming, so I generated some data just like in model under iarange and tried to find parameters via SVI. Here data2model is a tensor with classes {0,…,4}, solutions is an array [10, 100, …]

def model(data2model):
    loc = pyro.param('loc', 40 * torch.ones(data2model_len))
    scale = pyro.param('scale', torch.ones(data2model_len), constraint=constraints.positive)
    effort_coeff = pyro.param('effort_coeff', torch.tensor(1.), constraint=constraints.unit_interval)
    with pyro.iarange('my_iarange', use_cuda=True):
        a= pyro.sample('a', dist.Normal(loc, scale))
        a= a.expand(
            (len(solutions), data2model_len)).reshape((data2model_len, len(solutions)))
        comfort = (a- torch.tensor(solutions).float().expand((data2model_len, -1))) *\
            effort_coeff/abs(a)
        softmax = torch.nn.Softmax(dim=0)
        pyro.sample('picked', dist.Categorical(probs=softmax(comfort)), obs=data2model)

@config_enumerate(default="parallel") 
def guide(data):
#     loc = pyro.param('loc', 40 * torch.ones(data2model_len))
#     scale = pyro.param('scale', torch.ones(data2model_len), constraint=constraints.positive)`
# with this and commented loc and scale below, sigma goes to 0. Now model converges to guide 
# (loc and scale become 25 and 3)
    with pyro.iarange('my_iarange', use_cuda=True):
        loc = 25 * torch.ones(data2model_len)
        scale = 3 * torch.ones(data2model_len)
        a= pyro.sample('a', dist.Normal(loc, scale))
        prior = torch.tensor([0.2, 0.1, 0.05, 0.05, 0.6]).expand((data2model_len, 5))
        assignment_probs = pyro.param('assignment_probs', prior, constraint=constraints.unit_interval)
        picked_prior = dist.Categorical(assignment_probs)
        pyro.sample('picked', picked_prior, infer={'is_auxiliary': True})

optim = pyro.optim.Adam({'lr': 1e-1})
inference = SVI(model, guide, optim, loss=TraceEnum_ELBO(max_iarange_nesting=1))

My generated data was sampled from normal(30, 2) and transformed. I have 2 different behaviors of my model (see comments in code). What can I do to successfully find real mu and sigma for my normal distribution? Also what does ‘is_auxiliary’ parameter does?
P.S. please connect login via fb or github on forum

fritzo · July 22, 2018, 3:56pm

Hi @eamag, I don’t fully understand what your model is intended to fit, but here are some observations:

It looks like you’re using params in the model but none in the guide (for loc,scale). But I may be misunderstanding. What does the variable a denote?

If you are really learning local parameters (one per datapoint), I think it makes more sense to sample from a fixed prior in the model and from a parametrized poseterior in the guide, e.g.
```py
def model(data2model):
    ...
    with pyro.iarange("data"):
        a = pyro.sample("a", dist.Normal(0., 100.))  # shared weak prior
        ...
def guide(data2model):
    loc = pyro.param("loc", ...)
    scale = pyro.param("scale", ...)
    with pyro.iarange("data"):
        a = pyro.sample("a", dist.Normal(loc, scale))  # local posterior
        ...
```

I think you should omit the picked site in the guide. In Pyro we distinguish between “observe sites” that are sample sites with obs= specified, versus “sample sites” without obs= specified. Whereas sample statements should match 1:1 between model and guide, observe statements should only appear in the model and not in the guide. Note that is_auxiliary is for internal use (it is to allow kludgey plumbing of data from guide to model via Delta sample statements; if you really want to know, see how it is used inside pyro.contrib.autoguide).
Beware the a.reshape() in your model. I suspect what you actually want is a .transpose(0,1) or simply .t(). Try it out on some small tensors to see the difference.
The params that are learned in the guide are posterior params, whereas you’ve named the picked params prior. I’m not sure this is mathematical error.

Finally, you might take a look at the Gaussian mixture model example since it also learns local class probabilities. Let me know if this helps, and we can iterate to get your model working as intended.

eamag · July 24, 2018, 11:50am

Hi, @fritzo, thank you for your help

Variable a is a temproraly variable, hidden parameter for the whole dataset, I just can’t sample with pyro.sample("a", dist.Normal(0., 100.), sample_shape=(data2model_len, )) because I get log_prob() got an unexpected keyword argument 'sample_shape'. Generally I want to learn this hidden param by observing Categorical distribution “obs”. I got inspired by GMM example, where parameters are in the guide, not in the model.

2, 3, 4: I’ve updated my model:

def model(data2model):
    loc = 20 * torch.ones(data2model_len)
    scale = 3*torch.ones(data2model_len)
    effort_coeff = torch.tensor(1.)
    with pyro.iarange('my_iarange', use_cuda=True):
        a = pyro.sample('a', dist.Normal(loc, scale))
        a = a.expand((len(solutions), data2model_len)).t()
        comfort = (a - torch.tensor(solutions).float().expand((data2model_len, -1))) *\
            effort_coeff/abs(a)
        softmax = torch.nn.Softmax(dim=0)
        pyro.sample('obs', dist.Categorical(probs=softmax(comfort)), obs=data2model)

def guide(data):
    loc = pyro.param('loc', 40 * torch.ones(data2model_len))
    scale = pyro.param('scale', torch.ones(data2model_len), constraint=constraints.positive)
    effort_coeff = pyro.param('effort_coeff', torch.tensor(1.), constraint=constraints.unit_interval)
    with pyro.iarange('my_iarange', use_cuda=True):
        a = pyro.sample('a', dist.Normal(loc, scale))

optim = pyro.optim.Adam({'lr': 1e-3})
inference = SVI(model, guide, optim, loss=Trace_ELBO(max_iarange_nesting=1))

but now I can’t get desired params, even though SVI looks converged:

fritzo · July 24, 2018, 5:41pm

A few quick observations:

There’s no need to set sample_shape=(data2model_len,) in your pyro.sample("a", ...) statement because loc,scale are already tensors of appropriate length.
You shouldn’t need to .expand() your torch.tensor(solutions), since broadcasting will take care of expanding.
I’m suspicious of the abs() rather than torch.abs(), it may be dropping gradients. I’d try:

comfort = (a - torch.tensor(solutions).float()) * effort_coeff / torch.abs(a)

I think you can avoid the softmax by using the logits kwarg to Categorical:

pyro.sample('obs', dist.Categorical(logits=comfort), obs=data2model)

eamag · July 27, 2018, 9:30am

Thanks for your comments, unfortunately, nothing of this helped
Any other ideas, maybe the whole concept is wrong and I should try another one? I have classes, generated by sampling from normal and transformed via (a - torch.tensor(solutions).float()) * effort_coeff / torch.abs(a) and then softmaxed. I want to find mean and variance, observing only classes and “solutions”.
Thank you!