Categorical distribution as a variational posterior


#1

My Question would be the following:

In my current work I would like to choose a categorical distribution with its parameters as the variational distribution. Addtionally, I would like to use stochastic variational inference in pyro. This would result in updating the variational parameters. Now since we know that for a categorical distrbution, the parameters are probabilities. Updating the parameters naively might result in the values spilling out of the range [0,1]. Does the SVI internally take this factor into account thereby being mindful of the spill over during the parameter updates?


#2

see the use of constraints.simplex, e.g. here https://github.com/uber/pyro/blob/dev/examples/hmm.py


#3

cool!! Thanks. Will try em out!!


#4

I would like to construct a categorical distribution with three parameters a,b and c. However, these parameters are matrices. Technically I want to construct dist.Categorical(a,b,c) however, I am unable to do it Pyro. Can anyone suggest what can be done?


#5

Technically I want to construct dist.Categorical(a,b,c)

What do you mean by a, b, c - are these matrices stacked one on top of each other or concatenated vertically? If so, dist.Categorical can take in a tensor provided you do the stacking or concatenation and pass in a single concatenated tensor when instantiating the Categorical.


#6

a, b and c are 510 by 10 matrices each and are probabilities of weights(eg: wij) being -1,0 and 1. So I can convert these to vectors, stack them on top of each other and then provide this to dist.Categorical()… Correct me if I understood it wrong


#7

by default for three parameter categorical pyro returns 0,1 and 2. However, i want -1,0,1. so i could do 1-dist.Categorical(a,b,c).sample()

however, How do i do this in the guide function ? 1-dist.Categorical(…) returns error


#8

So I can convert these to vectors, stack them on top of each other and then provide this to dist.Categorical()

That’s right - you can do torch.stack([a, b, c]) which should give you a tensor of size (510*3, 10, 3). I am guessing that your trailing dim has size 3 to account for each of the weight values.

by default for three parameter categorical pyro returns 0,1 and 2. However, i want -1,0,1. so i could do 1-dist.Categorical(a,b,c).sample()

You probably want to do 1 - dist.Categorical(..).sample(). By itself, dist.Categorical(..) only gives you a distribution instance. In pyro you can do something like values = 1 - pyro.sample("cat", dist.Categorical(weights)) which will do the sampling behind the scenes.

If you have any distribution specific questions, I would suggest bringing them over to the PyTorch distributions channel. You’ll likely get a faster response there.


#9

Okay thanks for your reply. I will put up the question in the pytorch forum as well. In the model and the guide function, the sampling itself takes place behind the scenes. For a quick look, consider the following images. The python dictionary contains instances of the distribution classes. Let the dictionary contain instances of the categorical class. The command lifted_reg_model=lifted_module() samples the weight and bias values from the categorical distribution and applies them to a neural network. So by default these values will be 0,1,2. But how can I make the change to -1,0,1 ? doing 1-pyro.sample(…) returns tensor but not a class instance corresponding to samples -1,0,1


#10

I see, I don’t see a way of doing this using pyro.random_module, unless you change your prior functions to do this mapping for you (i.e. inherit from Categorical, subtract 1 in .sample and add it back again in .log_prob). I wouldn’t recommend it, but maybe @jpchen can comment if there is an easier way.

I think you’ll be better off not using random_module and doing regression directly. See http://pyro.ai/examples/bayesian_regression_ii.html for an example.


#11

Thank you for the reply. Do you feel if this would be feasible in Edward library?


#12

yes given your constraints you have one of two options:

  1. create a custom distribution - we do this a lot for our own work as well - if it is only used in the model, you don’t even need to implement a sample method.
  2. as neeraj suggested, sample the layers directly without using random_module

i think (1) is more straightforward but either work. side note: generally you should use composition over inheritance


#13

Thank you @jpchen. Will explore both the possibilities.


#14

@jpchen,

U had mentioned the possibility (1) for generating a categorical distribution that samples -1,0,1 as oppossed to the standard (0,1,2) for a three event scenario. Just to make myself clear, I would need to use composition over inheritance to to generate random_module for my network?Also, Is there a way to reshape a categorical distribution to say [32,1,5,5] before sampling ?