# Categorical distribution as a variational posterior

#1

My Question would be the following:

In my current work I would like to choose a categorical distribution with its parameters as the variational distribution. Addtionally, I would like to use stochastic variational inference in pyro. This would result in updating the variational parameters. Now since we know that for a categorical distrbution, the parameters are probabilities. Updating the parameters naively might result in the values spilling out of the range [0,1]. Does the SVI internally take this factor into account thereby being mindful of the spill over during the parameter updates?

#2

see the use of `constraints.simplex`, e.g. here https://github.com/uber/pyro/blob/dev/examples/hmm.py

#3

cool!! Thanks. Will try em out!!

#4

I would like to construct a categorical distribution with three parameters a,b and c. However, these parameters are matrices. Technically I want to construct dist.Categorical(a,b,c) however, I am unable to do it Pyro. Can anyone suggest what can be done?

#5

Technically I want to construct dist.Categorical(a,b,c)

What do you mean by `a, b, c` - are these matrices stacked one on top of each other or concatenated vertically? `If so, dist.Categorical` can take in a tensor provided you do the stacking or concatenation and pass in a single concatenated tensor when instantiating the Categorical.

#6

a, b and c are 510 by 10 matrices each and are probabilities of weights(eg: wij) being -1,0 and 1. So I can convert these to vectors, stack them on top of each other and then provide this to dist.Categorical()â€¦ Correct me if I understood it wrong

#7

by default for three parameter categorical pyro returns 0,1 and 2. However, i want -1,0,1. so i could do 1-dist.Categorical(a,b,c).sample()

however, How do i do this in the guide function ? 1-dist.Categorical(â€¦) returns error

#8

So I can convert these to vectors, stack them on top of each other and then provide this to dist.Categorical()

Thatâ€™s right - you can do `torch.stack([a, b, c])` which should give you a tensor of size `(510*3, 10, 3)`. I am guessing that your trailing dim has size 3 to account for each of the weight values.

by default for three parameter categorical pyro returns 0,1 and 2. However, i want -1,0,1. so i could do 1-dist.Categorical(a,b,c).sample()

You probably want to do `1 - dist.Categorical(..).sample()`. By itself, `dist.Categorical(..)` only gives you a distribution instance. In pyro you can do something like `values = 1 - pyro.sample("cat", dist.Categorical(weights))` which will do the sampling behind the scenes.

If you have any distribution specific questions, I would suggest bringing them over to the PyTorch distributions channel. Youâ€™ll likely get a faster response there.

#9

Okay thanks for your reply. I will put up the question in the pytorch forum as well. In the model and the guide function, the sampling itself takes place behind the scenes. For a quick look, consider the following images. The python dictionary contains instances of the distribution classes. Let the dictionary contain instances of the categorical class. The command lifted_reg_model=lifted_module() samples the weight and bias values from the categorical distribution and applies them to a neural network. So by default these values will be 0,1,2. But how can I make the change to -1,0,1 ? doing 1-pyro.sample(â€¦) returns tensor but not a class instance corresponding to samples -1,0,1

#10

I see, I donâ€™t see a way of doing this using `pyro.random_module`, unless you change your prior functions to do this mapping for you (i.e. inherit from Categorical, subtract 1 in `.sample` and add it back again in `.log_prob`). I wouldnâ€™t recommend it, but maybe @jpchen can comment if there is an easier way.

I think youâ€™ll be better off not using `random_module` and doing regression directly. See http://pyro.ai/examples/bayesian_regression_ii.html for an example.

#11

Thank you for the reply. Do you feel if this would be feasible in Edward library?

#12

yes given your constraints you have one of two options:

1. create a custom distribution - we do this a lot for our own work as well - if it is only used in the model, you donâ€™t even need to implement a `sample` method.
2. as neeraj suggested, sample the layers directly without using `random_module`

i think (1) is more straightforward but either work. side note: generally you should use composition over inheritance

#13

Thank you @jpchen. Will explore both the possibilities.

#14

U had mentioned the possibility (1) for generating a categorical distribution that samples -1,0,1 as oppossed to the standard (0,1,2) for a three event scenario. Just to make myself clear, I would need to use composition over inheritance to to generate random_module for my network?Also, Is there a way to reshape a categorical distribution to say [32,1,5,5] before sampling ?