A Subtle shape mismatch between deep layers


I tried to follow Bayesian Regression tutorial architecture to implement a more complex LSTM model.
In the model, I use two LSTM layers as encoder, with hidden units 128 and 64 separately. And I use pyro.random_module to lift the weights to stochastic variables. I use independent(1) in the code, like
priors_dist[layer_name] = pyro.distributions.Normal(weights_loc, weights_scale).independent(1).

However running the code I face RuntimeError: The size of tensor a (512) must match the size of tensor b (256) at non-singleton dimension 0.
I found the error occurred at class Trace_ELBO --> def _compute_log_r(model_trace, guide_trace)–>log_r.add((stacks[name], log_r_term.detach())).
The reason is that at first layer the log_r_term size is (512, ), but second layer the log_r_term size change to (256, ), and in the log_r.add function it will face shape mismatch error.

It seems that I have to use independent(2) to eliminate the log_r_term shape mismatch. Is there some better ways to solve the problem?


I think that using .independent(2) is right because your weight’s dim is 2.


Thanks, @fehiepsi! I did it the same way i.e. when the weights tensor is 2D I use .independent(2), and when the tensor is 1D I use .independent(1). The model could work.
However, th elbo loss was pretty high which was about 4 million. I had to modify the observe normal distribution’s scale from 0.2 to 1.5, and the loss reduced to about 70,000. Then it was never less than 50,000 during SVI steps. I thought it may be the possible large KL(q(Z)|p(Z)) term to induce the large loss as there are about 0.3 million parameter distributions. But if I use traditional deep layers with dropout and L2 regularize, which is identical to the VI process in theory, the MSE loss wouldn’t be too high.
Is someone has same experience?


I usually observed high ELBO losses in my experiments. ^^


The ELBO loss term returned is the sum for the entire mini-batch, not the mean, in case that helps explain why you are seeing large loss values. See a discussion here on a similar issue.