SVI with AutoDelta guide and num_particles > 1 samples identical particles

MontyPython · August 26, 2020, 2:04pm

Hi all,

I am using Pyro version 1.4.0. I have a toy HMM

class HMM(torch.nn.Module):    
    def __init__(self, dim_x, dim_z, T):
        super().__init__()            
        self.dim_z = dim_z
        self.T = T
        self.model_net = ModelNet(dim_x, dim_z)           
    
    def prior(self, b_s):
        z = [None] * self.T 
        z[0] = pyro.sample('z_0', dist.Normal(torch.zeros(self.dim_z), 1.).to_event(1))
        for t in range(1, self.T):   
            z[t] = pyro.sample('z_{}'.format(t), dist.Normal(z[t-1], 1.).to_event(1))        
        return torch.stack(z, -1).transpose(-1,-2)
          
    def model(self, x):
        pyro.module('hmm', self)        
        b_s = x.shape[1]
        with pyro.plate('batch_plate', b_s, dim=-1): 
            z = self.prior(b_s)     
            print('example 0 particle 0:', z[0,0])
            print('example 0 particle 1:', z[1,0])
            print('example 1 particle 0:', z[0,1])
            print('example 1 particle 1:', z[1,1])
            mu_x = self.model_net.z_to_x(z)
            for t in range(self.T):
                pyro.sample('x_{}'.format(t), dist.Normal(mu_x[...,t,:], 1.).to_event(1), obs=x[t])
        return z

In the main method, I set up an instance of this HMM and I seek to find the MAP solution for some batch of examples:

dim_x = 10    
T = 20
minibatch_size = 5 
x = torch.randn(T, minibatch_size, dim_x)
    
dim_z = 10
hmm = HMM(dim_x, dim_z, T) 
         
num_particles = 3    

elbo = Trace_ELBO(num_particles=num_particles, vectorize_particles=True, max_plate_nesting=1) 
optimizer = Adam({})
guide = AutoDelta(hmm.model, init_loc_fn=init_to_sample)
svi = SVI(hmm.model, guide, optimizer, elbo)    
loss = svi.step(x)

However, the printout from inside the model shows that the z samples from prior() are different across the batch dim but not across the particle dim.

I have not set either the torch or pyro random seeds, and if I manually add the vectorised plate and sample from the prior, the samples are then different both across the particle and batch dimensions, as expected:

max_plate_nesting = 2
with pyro.plate("num_particles_vectorized", num_particles, dim=-max_plate_nesting):
    z = poutine.uncondition(hmm.model)(x)

What am I missing here? Thank you in advance!

martinjankowiak · August 26, 2020, 5:46pm

why do you expect the particles do be different? a delta function concentrates all mass at a point so there’s no possibility for particle/sample diversity

MontyPython · August 26, 2020, 8:42pm

Ah, I was thinking the AutoDelta converts latent variables into parameters, and that having more than 1 particle would be equivalent to having multiple sets of latent variable values at which the model would be evaluated.

What would be the appropriate way to achieve this in Pyro?

martinjankowiak · August 26, 2020, 9:19pm

even if that were the case (which it’s not) then (up to optimization issues) all those particles would end up converging to the same value (since they’re targeting the same MAP objective function).

if you want a bag-of-particles estimate instead of a point estimate you can in principle use SVGD, although i should point out that in my experience it’s often pretty hard to get reasonable results from this class of algorithms (among other reasons probably because the multi-particle optimization becomes difficult).

MontyPython · August 26, 2020, 9:48pm

Indeed my main issue is optimization - in the real problem I have, there is a huge poor local minimum of MAP objective.

I will have a look at that, thanks!

martinjankowiak · August 26, 2020, 9:59pm

well if you’re having difficulty with MAP optimization i doubt SVGD will make your life easier. instead i’d spend time/energy on trying different initialization strategies etc