Slicing tensors with Vindex and enumeration issues

artistworking · April 23, 2021, 5:03pm

Hi!!

I require a little help understanding how to use Vindex for tensor slicing in my HMM version. I have read the documentation Mechanics of enumeration , GMM and HMM. I am following model_1 from the HMM, but a little different…

pytorch 1.8, cuda 10.2 ---> Although my cuda library is 11.3
pyro-ppl 1.6.0

My prob_x matrix, the transition probability matrix has dimensions == [n_sequences,hidden_dim,hidden_dim], instead of [hidden_dim,hidden_dim] as in the model 1 from the hmm.py example.
Therefore I require some slicing. I looked into Vindex, and I tried to use it to slice as follows:

def model():
     ....
     n_sequences = 6
     hidden_dim = 2
     batch_size = 3
     probs_x = torch.randn(n_sequences,hidden_dim,hidden_dim)
     with pyro.plate("sequences", size = n_sequences, size = batch_size, dim=-2) as batch:
                lengths = lengths[batch]
                x = 0
                for t in pyro.markov(range(lengths.max())):
                    with poutine.mask(mask=(t < lengths).unsqueeze(-1)):
                                     prob_x = Vindex(probs_x)[batch,x] 
                                     print("probs_x shape: {}".format(probs_x.shape))
                                     x = pyro.sample("x_{}".format(t), dist.Categorical(prob_x),
                                        infer={"enumerate": "parallel"})
                                     print("hidden state shape".format(x.shape))
                                     print("...................")

guide = AutoDelta(poutine.block(model,expose_fn=lambda msg:msg[“name”].startswith(“OU_”)))

elbo = TraceEnum_ELBO(max_plate_nesting= 2,strict_enumeration_warning=True)

I do not think I need ellipsis here, so I discarded that and other attempts of slicing

My simple mind cannot understand what is going on with the enumeration here. I understand that for each discrete variable we will perform and enumeration, and infere over the other continous variables, but I cannot seem to use it properly.

The error for the code above is the following:

probs_x shape: torch.Size([3, 2])
hidden state shape
…
probs_x shape: torch.Size([3, 3])
hidden state shape
…
probs_x shape: torch.Size([3, 3])

Traceback (most recent call last):
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py”, line 165, in call
ret = self.fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/Draupnir_models.py”, line 2282, in model
infer={“enumerate”: “parallel”}) # [batch_size,1]–> [16,1,1] // [16, 1, 1, 1]
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/primitives.py”, line 156, in sample
apply_stack(msg)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/runtime.py”, line 201, in apply_stack
default_process_message(msg)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/runtime.py”, line 162, in default_process_message
msg[“value”] = msg[“fn”](*msg[“args”], **msg[“kwargs”])
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/distributions/torch_distribution.py”, line 46, in call
return self.rsample(sample_shape) if self.has_rsample else self.sample(sample_shape)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/distributions/categorical.py”, line 111, in sample
probs_2d = self.probs.reshape(-1, self._num_events)
RuntimeError: CUDA error: device-side assert triggered

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py”, line 165, in call
ret = self.fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/nn/module.py”, line 413, in call
return super().call(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/nn/modules/module.py”, line 889, in _call_impl
result = self.forward(*input, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/infer/autoguide/guides.py”, line 379, in forward
self._setup_prototype(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/infer/autoguide/guides.py”, line 349, in _setup_prototype
super()._setup_prototype(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/infer/autoguide/guides.py”, line 164, in _setup_prototype
self.prototype_trace = poutine.block(poutine.trace(model).get_trace)(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py”, line 187, in get_trace
self(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py”, line 171, in call
raise exc from e
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py”, line 165, in call
ret = self.fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:197: sampleMultinomialOnce : block: [0return fn(*args, **kwargs),0
,0], thread: [1 File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
,0,0] Assertion val >= zero failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:197: sampleMultinomialOnce: block: [6,0,0], thread: [1,0,0] Assertion val >= zero failed.
/pytorch/aten/src/ATen/native/cuda/MultinomialKernel.cu:197: sampleMultinomialOnce: block: [3,0,0], thread: [1,0,0] Assertion val >= zero failed.
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/Draupnir_models.py”, line 2282, in model
infer={“enumerate”: “parallel”}) # [batch_size,1]–> [16,1,1] // [16, 1, 1, 1]
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/primitives.py”, line 156, in sample
apply_stack(msg)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/runtime.py”, line 201, in apply_stack
default_process_message(msg)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/runtime.py”, line 162, in default_process_message
msg[“value”] = msg[“fn”](*msg[“args”], **msg[“kwargs”])
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/distributions/torch_distribution.py”, line 46, in call
return self.rsample(sample_shape) if self.has_rsample else self.sample(sample_shape)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/distributions/categorical.py”, line 111, in sample
probs_2d = self.probs.reshape(-1, self._num_events)
RuntimeError: CUDA error: device-side assert triggered

Thanks in advance for any insights

(It could be a problem with cuda as well but just to be sure that the indexing is correct)

martinjankowiak · April 23, 2021, 5:10pm

you will likely get more useful error messages if you run this code on cpu. when cuda is fed a bad index it often generates useless error messages

artistworking · April 23, 2021, 6:19pm

Ok, transferred to cpu. New error:

probs_x shape: torch.Size([6, 2])
Traceback (most recent call last):
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_messenger.py”, line 165, in call
ret = self.fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/messenger.py”, line 12, in _context_wrap
return fn(*args, **kwargs)
File “/home/…/Draupnir_models.py”, line 2283, in model
infer={“enumerate”: “parallel”}) # [batch_size,1]–> [16,1,1] // [16, 1, 1, 1]
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/primitives.py”, line 156, in sample
apply_stack(msg)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/runtime.py”, line 201, in apply_stack
default_process_message(msg)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/runtime.py”, line 162, in default_process_message
msg[“value”] = msg[“fn”](*msg[“args”], **msg[“kwargs”])
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/distributions/torch_distribution.py”, line 46, in call
return self.rsample(sample_shape) if self.has_rsample else self.sample(sample_shape)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/distributions/categorical.py”, line 112, in sample
samples_2d = torch.multinomial(probs_2d, sample_shape.numel(), True).T
RuntimeError: invalid multinomial distribution (encountering probability entry < 0)

martinjankowiak · April 23, 2021, 7:10pm

well this suggests your probs_x has negative/nan values. so presumably you have some numerical/optimization issue: either your learning rate is too high, your initialization scheme is suboptimal, you need to clamp gradients for extra stability, you need to use double precision, etc etc etc

artistworking · April 26, 2021, 10:25am

@martinjankowiak Hi sorry, that’s my fault, I did a quick random initialization for this toy example and the probs_x should have been positive (changed randn to rand):
probs_x = torch.rand(n_sequences,hidden_dim,hidden_dim)

    def model():
         n_sequences = 6
         hidden_dim = 2
         batch_size = 3
         lengths = torch.tensor([7,7,7,7,7,7])
         probs_x = torch.rand(n_sequences,hidden_dim,hidden_dim)
         with pyro.plate("sequences", size = n_sequences, subsample_size = batch_size, dim=-2) as batch:
                    lengths = lengths[batch]
                    x = 0
                    for t in pyro.markov(range(lengths.max())):
                        with poutine.mask(mask=(t < lengths).unsqueeze(-1)):
                                         prob_x = Vindex(probs_x)[batch,x] 
                                         print("probs_x shape: {}".format(probs_x.shape))
                                         x = pyro.sample("x_{}".format(t), dist.Categorical(prob_x),
                                            infer={"enumerate": "parallel"})
                                         print("hidden state shape {}".format(x.shape))
                                         print("...................")

I guess my doubt is how to integrate the plate appropiately with enumeration. I do want to stablish conditional independence among sequences in the batch, but the broadcasting of the distribution’s shape under the plate is messing up everything.

With the above code in the CPU I get an obvious indexing error because it broadcasts the hidden’s state shape and it cannot be used appropiatelly in the next loop:

probs_x shape: torch.Size([3, 2])
hidden state shape torch.Size([3, 1, 3])
…
Traceback (most recent call last):
…
File “/home/…/Draupnir_models.py”, line 2281, in model
probs_x = Vindex(probs_x)[batch,x]
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/ops/indexing.py”, line 215, in getitem
return vindex(self._tensor, args)
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/ops/indexing.py”, line 196, in vindex
return tensor[args]
IndexError: index 5 is out of bounds for dimension 0 with size 3

I hope I am more clear now…

Thanks again for the help

martinjankowiak · April 26, 2021, 3:58pm

could you please provide a complete runnable snippet of code with your toy example?

artistworking · April 26, 2021, 6:12pm

@martinjankowiak Yes, sorry, here it is:)

 import pyro.distributions as dist
 import torch
 from pyro.infer import SVI,TraceEnum_ELBO
 from pyro import poutine
 from pyro.infer.autoguide import AutoDelta
 from pyro.ops.indexing import Vindex
 import pyro
 def model():
     n_sequences = 6
     hidden_dim = 2
     batch_size = 3
     emission_categories = 5
     sequences = torch.tensor([[0,1,4,2,1,0,3],
                               [0,1,3,2,1,2,3],
                               [0,3,4,2,1,0,4],
                               [0,1,0,2,1,0,1],
                               [0,1,4,2,1,4,2],
                               [0,1,0,2,2,1,3]])
     lengths = torch.tensor([7, 7, 7, 7, 7, 7])
     probs_x = torch.rand(n_sequences, hidden_dim, hidden_dim)
     probs_y = torch.rand(n_sequences, hidden_dim, emission_categories)
     emissions_plate = pyro.plate("emissions",size=1,dim=-1)
     with pyro.plate("sequences", size=n_sequences, subsample_size=batch_size, dim=-2) as batch:
         lengths = lengths[batch]
         x = 0
         for t in pyro.markov(range(lengths.max())):
             with poutine.mask(mask=(t < lengths).unsqueeze(-1)):
                 prob_x = Vindex(probs_x)[batch, x]
                 print("probs_x shape: {}".format(probs_x.shape))
                 x = pyro.sample("x_{}".format(t), dist.Categorical(prob_x),infer={"enumerate": "parallel"})
                 print("hidden state shape {}".format(x.shape))
                 print("...................")
                 with emissions_plate: #I have not been able to debug this part, so it's a copy paste from the original model
                     pass
                 #     pyro.sample("y_{}".format(t), dist.Categorical(probs_y[x.squeeze(-1)]),obs=sequences[batch, t])
 guide = AutoDelta(poutine.block(model))
 elbo = TraceEnum_ELBO(max_plate_nesting= 2,strict_enumeration_warning=True)
 config = {
     "lr": 1e-3,
     "beta1": 0.9,  # coefficients used for computing running averages of gradient and its square
     "beta2": 0.999,
     "eps": 1e-8,  # term added to the denominator to improve numerical stability
     "weight_decay": 0,  # weight_decay: weight decay (L2 penalty)
     "clip_norm": 10,  # clip_norm: magnitude of norm to which gradients are clipped
     "lrd": 1,  # rate at which learning rate decays
     "z_dim": 30,
     "gru_hidden_dim": 60,
 }
 adam_args = {"lr": config["lr"], "betas": (config["beta1"], config["beta2"]),"eps":config["eps"],"weight_decay":config["weight_decay"],"clip_norm":config["clip_norm"],"lrd":config["lrd"]}
 optim = pyro.optim.ClippedAdam(adam_args)
 svi = SVI(model=model, guide=guide, optim=optim, loss=elbo)

 num_epochs = 3

 for epoch in range(num_epochs):
     svi.step()

 ```

martinjankowiak · April 27, 2021, 4:11pm

is this what you wanted?

    with pyro.plate("sequences", size=n_sequences, subsample_size=batch_size, dim=-1) as batch:
        lengths = lengths[batch]
        x = 0
        for t in pyro.markov(range(lengths.max())):
            with poutine.mask(mask=(t < lengths)):
                prob_x = probs_x[batch, x]
                x = pyro.sample("x_{}".format(t), dist.Categorical(prob_x),infer={"enumerate": "parallel"})

artistworking · April 27, 2021, 5:37pm

@martinjankowiak Ohh thanks! hmm, I see , so no weird vindex needed.

I mean, if the plate is maintaining the conditional independence among the sequences and the markov handler is taking care of the dependency on the previous character, that 's what I need.

I then added the emissions and I kept getting a problem with the distribution support . I made it very easy and gave equal emission probabilities to all the hidden states, the probabilities are real positive numbers, so I do not understand the support error there…or

import pyro.distributions as dist
import torch
from pyro.infer import SVI,TraceEnum_ELBO
from pyro import poutine
from pyro.infer.autoguide import AutoDelta
from pyro.ops.indexing import Vindex
import pyro
def model():
    n_sequences = 6
    hidden_dim = 2
    batch_size = 3
    emission_categories = 5
    sequences = torch.tensor([[0,1,4,2,1,0,5],
                              [0,1,3,2,1,2,5],
                              [0,3,4,2,1,0,5],
                              [0,1,5,2,1,0,5],
                              [0,1,4,2,1,4,5],
                              [0,1,5,2,2,1,5]])
    lengths = torch.tensor([7, 7, 7, 7, 7, 7])
    probs_x = torch.rand(n_sequences, hidden_dim, hidden_dim)
    probs_y = torch.ones(n_sequences, hidden_dim, emission_categories)/emission_categories
    with pyro.plate("sequences", size=n_sequences, subsample_size=batch_size, dim=-1) as batch:
        lengths = lengths[batch]
        x = 0
        for t in pyro.markov(range(lengths.max())):
            with poutine.mask(mask=(t < lengths)):
                prob_x = probs_x[batch, x]
                x = pyro.sample("x_{}".format(t), dist.Categorical(prob_x), infer={"enumerate": "parallel"})
                pyro.sample("y_{}".format(t), dist.Categorical(probs_y[batch,x]), obs=sequences[batch, t]) #<----New

guide = AutoDelta(poutine.block(model))
elbo = TraceEnum_ELBO(max_plate_nesting= 2,strict_enumeration_warning=True)
config = {
    "lr": 1e-3,
    "beta1": 0.9,  # coefficients used for computing running averages of gradient and its square
    "beta2": 0.999,
    "eps": 1e-8,  # term added to the denominator to improve numerical stability
    "weight_decay": 0,  # weight_decay: weight decay (L2 penalty)
    "clip_norm": 10,  # clip_norm: magnitude of norm to which gradients are clipped
    "lrd": 1,  # rate at which learning rate decays
    "z_dim": 30,
    "gru_hidden_dim": 60,
}
adam_args = {"lr": config["lr"], "betas": (config["beta1"], config["beta2"]),"eps":config["eps"],"weight_decay":config["weight_decay"],"clip_norm":config["clip_norm"],"lrd":config["lrd"]}
optim = pyro.optim.ClippedAdam(adam_args)
svi = SVI(model=model, guide=guide, optim=optim, loss=elbo)

num_epochs = 3

for epoch in range(num_epochs):
    svi.step()

Traceback (most recent call last):
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/poutine/trace_struct.py”, line 216, in compute_log_prob
log_p = site[“fn”].log_prob(site[“value”], *site[“args”], **site[“kwargs”])
File “/home/…/anaconda3/lib/python3.7/site-packages/pyro/distributions/torch.py”, line 114, in log_prob
return super().log_prob(value)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/distributions/categorical.py”, line 117, in log_prob
self._validate_sample(value)
File “/home/…/anaconda3/lib/python3.7/site-packages/torch/distributions/distribution.py”, line 277, in _validate_sample
raise ValueError(‘The value argument must be within the support’)
ValueError: The value argument must be within the support

Thank you very very much for your help

martinjankowiak · April 27, 2021, 6:01pm

i can’t reproduce this. this model has no learnable parameters so i don’t see how you could get that error…

artistworking · April 28, 2021, 3:40pm

@martinjankowiak ohh, might have been some strange computer behaviour , it does not happen today…and I got to work the model in the real data! Thanks!