Categorical distribution as a variational posterior


My Question would be the following:

In my current work I would like to choose a categorical distribution with its parameters as the variational distribution. Addtionally, I would like to use stochastic variational inference in pyro. This would result in updating the variational parameters. Now since we know that for a categorical distrbution, the parameters are probabilities. Updating the parameters naively might result in the values spilling out of the range [0,1]. Does the SVI internally take this factor into account thereby being mindful of the spill over during the parameter updates?


see the use of constraints.simplex, e.g. here


cool!! Thanks. Will try em out!!


I would like to construct a categorical distribution with three parameters a,b and c. However, these parameters are matrices. Technically I want to construct dist.Categorical(a,b,c) however, I am unable to do it Pyro. Can anyone suggest what can be done?


Technically I want to construct dist.Categorical(a,b,c)

What do you mean by a, b, c - are these matrices stacked one on top of each other or concatenated vertically? If so, dist.Categorical can take in a tensor provided you do the stacking or concatenation and pass in a single concatenated tensor when instantiating the Categorical.


a, b and c are 510 by 10 matrices each and are probabilities of weights(eg: wij) being -1,0 and 1. So I can convert these to vectors, stack them on top of each other and then provide this to dist.Categorical()… Correct me if I understood it wrong


by default for three parameter categorical pyro returns 0,1 and 2. However, i want -1,0,1. so i could do 1-dist.Categorical(a,b,c).sample()

however, How do i do this in the guide function ? 1-dist.Categorical(…) returns error


So I can convert these to vectors, stack them on top of each other and then provide this to dist.Categorical()

That’s right - you can do torch.stack([a, b, c]) which should give you a tensor of size (510*3, 10, 3). I am guessing that your trailing dim has size 3 to account for each of the weight values.

by default for three parameter categorical pyro returns 0,1 and 2. However, i want -1,0,1. so i could do 1-dist.Categorical(a,b,c).sample()

You probably want to do 1 - dist.Categorical(..).sample(). By itself, dist.Categorical(..) only gives you a distribution instance. In pyro you can do something like values = 1 - pyro.sample("cat", dist.Categorical(weights)) which will do the sampling behind the scenes.

If you have any distribution specific questions, I would suggest bringing them over to the PyTorch distributions channel. You’ll likely get a faster response there.


Okay thanks for your reply. I will put up the question in the pytorch forum as well. In the model and the guide function, the sampling itself takes place behind the scenes. For a quick look, consider the following images. The python dictionary contains instances of the distribution classes. Let the dictionary contain instances of the categorical class. The command lifted_reg_model=lifted_module() samples the weight and bias values from the categorical distribution and applies them to a neural network. So by default these values will be 0,1,2. But how can I make the change to -1,0,1 ? doing 1-pyro.sample(…) returns tensor but not a class instance corresponding to samples -1,0,1


I see, I don’t see a way of doing this using pyro.random_module, unless you change your prior functions to do this mapping for you (i.e. inherit from Categorical, subtract 1 in .sample and add it back again in .log_prob). I wouldn’t recommend it, but maybe @jpchen can comment if there is an easier way.

I think you’ll be better off not using random_module and doing regression directly. See for an example.


Thank you for the reply. Do you feel if this would be feasible in Edward library?


yes given your constraints you have one of two options:

  1. create a custom distribution - we do this a lot for our own work as well - if it is only used in the model, you don’t even need to implement a sample method.
  2. as neeraj suggested, sample the layers directly without using random_module

i think (1) is more straightforward but either work. side note: generally you should use composition over inheritance


Thank you @jpchen. Will explore both the possibilities.



U had mentioned the possibility (1) for generating a categorical distribution that samples -1,0,1 as oppossed to the standard (0,1,2) for a three event scenario. Just to make myself clear, I would need to use composition over inheritance to to generate random_module for my network?Also, Is there a way to reshape a categorical distribution to say [32,1,5,5] before sampling ?