#1

Hello

I have my own model+data to infer parameters related with multinomial distribution but that doesnāt work.
So I tried to solve the SVI Part1 problem with multinomial to see if thatās better but itās like that doesnāt work neither so Iām trying to guess whatās wrong.

``````data=[]
for i in range(2000):
data.append(pyro.sample("obs_{}".format(i),pdist.Multinomial(10,probs=torch.Tensor([0.6,0.4]))))

def model(data):
alpha0 = 10
alpha1 = 10
f = pyro.sample("latent",pdist.Beta(alpha0, alpha1))
ok = torch.Tensor([f,1-f])
for i in range(len(data)):
pyro.sample("obs_{}".format(i), pdist.Multinomial(10,probs=ok),obs=data[i])

def guide(data):
qalpha0 = pyro.param("qalpha0",torch.Tensor([15]),constraint=constraints.positive)
qalpha1 = pyro.param("qalpha1",torch.Tensor([15]),constraint=constraints.positive)
pyro.sample("latent",pdist.Beta(qalpha0, qalpha1))

adam_params = {"lr": 0.001, "betas": (0.9, 0.999)}

svi = SVI(model, guide, optimizer, loss=Trace_ELBO())

n_steps = 4000
for step in range(n_steps):
svi.step(data)
if step % 100 == 0:
print(step)
print(pyro.param("qalpha0")/(pyro.param("qalpha0")+pyro.param("qalpha1")))
``````

Basically Iām doing the same thing as in the tutorial except that the shape of the data and then the distribution used change (multinomial instead of bernoulli). The problem is that infered f given by the last line is roughly equal to 0.5 instead of 0.6. Iāve already though about the size of the data but with len(data)=2000 that infers perfectly 0.6 when using bernoulli distribution like in the SVI, plus the computation time is pretty high.

Whatās wrong ?

#2

Hi @telecom, Iām not sure whatās wrong but here are two things I noticed:

1. You should use `torch.tensor(15.)` instead of `torch.Tensor(15)` to initialize the qalphas. Note the lower case `t`. Note the decimal point making the result a float tensor. Note the lack of brackets making the result have `.dim()==0` rather than `.dim()==1`.
2. You can get faster dubugging cycles by vecorizing the data and observe statement:
``````data = pdist.Multinomial(10, probs=torch.tensor([0.6, 0.4]).expand_by([2000]).sample()
assert data.shape == (2000, 2)

def model(data):
...
with pyro.iarange("data", len(data)):  # faster than a for loop
pyro.sample("obs", pdist.Multinomial(10, probs=ok).expand_by([len(data)]),
obs=data)
``````

Also, what are your final `pyro.param('qalpha0')` values? I believe the posterior should be approximately

``````expected_qalpha0 = 1210.0
expected_qalpha1 = 810.0
``````

#3

Hi fritzo and thanks for the reply.

I think that for your second point you meant:
`data = pdist.Multinomial(10, probs=torch.tensor([0.6, 0.4]).expand(2000,-1)).sample()`

I resolved the problem by setting a Dirichlet instead of Beta so that parameter is a tensor of 2 elements and that works then (before that i had qalpha0=12.6 and qalpha1=13.1 to answer you):

``````pyro.clear_param_store()
data = pdist.Multinomial(10, probs=torch.tensor([0.6, 0.4]).expand(2000,-1)).sample()
assert data.shape == (2000, 2)

def model(data):
alpha = torch.Tensor([10.,10.])
f = pyro.sample("latent",pdist.Dirichlet(concentration=alpha))
with pyro.iarange("data", len(data)):
pyro.sample("obs", pdist.Multinomial(10, probs=f.expand(len(data),-1)),
obs=data)

def guide(data):
qalpha = pyro.param("qalpha",torch.Tensor([15.,15.]),constraint=constraints.positive)
pyro.sample("latent",pdist.Dirichlet(qalpha))
``````

(I obtain after inference qalpha=tensor[21,14] we have E[X1]=21/(14+21)=0.6, E[X2]=14/(14+21)=0.4, thatās also really much faster in that way).

Iām into working on my own problem now (maybe I will use Categorical instead of Multinomial, whatever). The tricky thing is that I have to put the latent variable in a list and apply a function on this last:

(pseudocode)
blabla1=torch.tensor(blabla)
blabla2=torch.tensor(blabla)

def model(data):
alpha = torch.Tensor([10.,10.])
f = pyro.sample(ālatentā,pdist.Dirichlet(concentration=alpha))
target = fun([f,blabla1,blabla2])
for i in range(len(data)):
pyro.sample(āobs_{}ā.format(i), pdist.Multinomial(10,target),obs=data[i])

Assuming that the function returns correctly a tensor, I hope that will works. (?)

Edit: Okay, that works well with some regular functions that Iāve try. I will do it with more complex function implying convolutions and so on. I update the topic if any issues

#4

Hey @telecom or any of the many other very smart people on this forum,

Were you able to get the multinomial/categorical version of the problem to work? Iām getting started with Pyro and trying to accomplish something similar with a Categorical distribution.

There are possibilities 0, 1, and 2 in the categorical distribution, and Iām feeding it a dataset that should be extremely skewed towards outcome 0. It compiles and runs, but the model doesnāt seem to be training at all. I would expect prob0 to be going up and the .

One thought I had and am unsure of how to debug is that the categorical distribution normalizes the `probs` argument automatically, but Iām not sure if that transformation of my `pyro.sample` statements is somehow breaking backprop through the rest of the model.

Dataset:

``````data = torch.cat((torch.zeros(500), torch.ones(5), torch.empty(5).fill_(2.)))
``````

Hereās my model and guide:

``````def model(data):
# Prior beta with top probability ~ 0.3
alpha0 = alpha1 = alpha2 = torch.tensor(6.)
beta0 = beta1 = beta2 = torch.tensor(10.)

prob0 = pyro.sample("prob0", dist.Beta(alpha0, beta0))
prob1 = pyro.sample("prob1", dist.Beta(alpha1, beta1))
prob2 = pyro.sample("prob2", dist.Beta(alpha2, beta2))

# loop over the observed data
with pyro.iarange("data_loop", len(data)):
pyro.sample("obs", dist.Categorical(
probs=torch.tensor([prob0, prob1, prob2])),
obs=data
)

def guide(data):
alpha_q0 = pyro.param("alpha_q0", torch.tensor(6.0),
constraint=constraints.positive)
beta_q0 = pyro.param("beta_q0", torch.tensor(15.0),
constraint=constraints.positive)
alpha_q1 = pyro.param("alpha_q1", torch.tensor(6.0),
constraint=constraints.positive)
beta_q1 = pyro.param("beta_q1", torch.tensor(15.0),
constraint=constraints.positive)
alpha_q2 = pyro.param("alpha_q2", torch.tensor(6.0),
constraint=constraints.positive)
beta_q2 = pyro.param("beta_q2", torch.tensor(15.0),
constraint=constraints.positive)

pyro.sample("prob0", dist.Beta(alpha_q0, beta_q0))
pyro.sample("prob1", dist.Beta(alpha_q1, beta_q1))
pyro.sample("prob2", dist.Beta(alpha_q2, beta_q2))
``````

Iām using the same SVI training code as the tutorial SVI Part I:

``````svi = SVI(model, guide, optimizer, loss=Trace_ELBO())

n_steps = 5000
for step in range(n_steps):
svi.step(data)
``````

And the output:

``````[ step 0 loss: 572.641185760498 probabilities: [0.3333333333333333, 0.3333333333333333, 0.3333333333333333] ]
...
[ step 4900 loss: 634.6478080749512 probabilities: [0.328363784009581, 0.33784123633053903, 0.33379497965988] ]

inferred final probabilities:
prob0: 0.329 +- 0.090
prob1: 0.337 +- 0.090
prob2: 0.334 +- 0.090
``````

Any thoughts, advice, or pointers to resources appreciated!

#5

for one, the gradients are being killed when you construct a new tensor with the sampled values. why not do this in a vectorized way? eg:

``````probs = pyro.sample('probs', dist.Beta(6., 10.).expand(3))
with pyro.iarange("data_loop", len(data)):
pyro.sample("obs", dist.Categorical(probs), obs=data)
``````

and have something analogous in your guide

#6

was unaware of that type of usage for `expand`. Wish I would have known that sooner, thanks for the tip!

The gradient dying when the new tensor is constructed makes sense to me now in the sense that it ābreaksā the link between the `pyro.sample(...obs=data)` statement and the original samples. Are there formal rules around this behavior that a beginner could look out for? i.e. what manipulations are and are not allowed on Pyro samples in order to preserve the gradient? I know Iāve at least done scalar operations on them in the past and maintained a working model.

#7

this is a general pytorch thing and not a pyro specifidc one. in pre-0.4 pytorch, there were `Variable`s, which were tensors with additional metadata that tracked the gradients. if you extracted a value from the `Variable`, essentially the gradient wouldnāt flow through that point. in pytorch 0.4, `Variable` and `Tensor` have been combined but the same principle holds in that if you extract a value from a `Tensor` and operate on it, the gradients wonāt be tracked in the autograd graph.

#8

Gotcha. I guess itās not that intuitive that making the sample part of a larger tensor like you called out in my initial question is an operation that āextractsā the value from the tensor, but itās something I can get used to.

#9

Running into a couple of things trying to move forward with your suggestion using the following code:

``````def model(data):
alpha = torch.tensor(6.)
beta = torch.tensor(10.)

probs = pyro.sample('probs', dist.Beta(alpha, beta).expand(3))

with pyro.iarange("data_loop", len(data)):
pyro.sample("obs", dist.Categorical(probs=probs), obs=data)

def guide(data):
alpha = pyro.param('alpha', torch.tensor(6.))
beta = pyro.param('beta', torch.tensor(15.))

pyro.sample('probs', dist.Beta(alpha, beta).expand(3))
``````

Iām getting the following error when attempting to run SVI:

``````ValueError: at site "probs", invalid log_prob shape
Expected [], actual [3]
Try one of the following fixes:
- enclose the batched tensor in a with iarange(...): context
- .independent(...) the distribution being sampled
- .permute() data dimensions
``````

When I try the first fix, and move the `probs` sample into the `iarange` context I get the same error, except with `Expected [511], actual [3]`. The data shape is `[511]` so I guess thatās the conflict in the `iarange` context.

I tried reading up on the other suggested fixes and while I donāt fully understand them, it doesnāt seem like they would apply to my problem which is fully in 1D space since they seem to deal with dimensionality.

On a higher level, the usage of `expand` is great and I will be using it a lot moving forward but Iām having a hard time reconciling it with this problem. I am looking to create three distinct probability distributions for three different outcomes, and I donāt see how that would work with just one beta distribution. My original broken āsolutionā had six trainable hyperparameters for three distributions, which maps directly to the three separate probabilities, and it doesnāt seem like I could get the same result with one expanded distribution?

#10

try tacking on `independent()` to your sample statement in `iarange` ie:

``````pyro.sample("obs", dist.Categorical(probs=probs).independent(1), obs=data)
``````

and yes, your āvectorizedā guide is not equivalent to your original guide, you would need need to wrap a tensor of the same size (in this case `(3)`) in `pyro.param` so that each param can be learned separately.

#11

Got the following model and guide to train as expected:

``````def model(data):
alpha = torch.tensor(6.)
beta = torch.tensor(10.)

probs = pyro.sample('probs', dist.Beta(alpha, beta).expand(3).independent(1))

with pyro.iarange("data_loop", len(data)):
pyro.sample("obs", dist.Categorical(probs=probs), obs=data)

def guide(data):
alphas = pyro.param('alphas', torch.tensor((6., 6., 6.)), constraint=constraints.positive)
betas = pyro.param('betas', torch.tensor((10., 10., 10.)), constraint=constraints.positive)

pyro.sample('probs', dist.Beta(alphas, betas).independent(1))
``````

I ended up applying `independent` to the Beta distributions instead of the Categorical distribution as originally suggested. When I tried applying it to the Categorical distribution, I got an unpleasant `ValueError: Expected reinterpreted_batch_ndims <= len(sample_shape + base_dist.batch_shape), actual 1 vs 0`.

My tensor-fu is pretty weak, so Iām still trying to reason out how the respective batch shapes and event shapes affect whatās happening here. Will try to report back once I straighten that out. If there is a more intuitive explanation for this behavior (and why `indepedent` is a fix for the invalid log_prob shape in the first place) Iād love to hear it.

Thanks for all of your help @jpchen!