Eloquant way to have an empty guide with sample sites in model still? (SVI)

megaloman · July 31, 2019, 1:20am

Ideally, in my undestanding of Pyro, it would be something like:

#import statements
p_z_x = #fancy neural network 
def model(X, y):
     pyro.plate('data'):
            z = pyro.sample('z', Normal(mu, sigs).to_event())
            #other stuff like 
            # outputs = p_x_z(z) or something

def guide(X, y):
      # do nothing

#build your optimzier, svi etc etc.

Now, I know the above does not work and I get a key error doing this silly billy business.
Instead, I tried the following much more hacky solution

#import statements
p_z_x = #fancy neural network 

def model(X, y):
     mu, sigs = p_z_x(X)
     pyro.plate('data'):
            z = pyro.sample('z', Normal(mu, sigs))
            #other stuff like 
            # outputs = p_x_z(z) ,etc etc,

def guide(X, y):
     #just produce parameters again, maybe detach this time or something
     mu, sigs = p_z_x(X) #generate samples from my "variational" distr
     pyro.plate('data'):
            z = pyro.sample('z', Normal(mu, sigs))

#build your optimzier, svi etc etc.

It seems to do the trick, but I’m wondering if there’s a better way to go about this? Perhaps it seems like a weird request, but to reimplement a paper by what they ACTUALLY do I need to be able to do something along these lines. I think I’m basically asking the same thing as the passing variables post but it’d be good just to clarify. Would I basically instead call my sampling in guide then use a delta function in model instead?

gdalle · July 31, 2019, 10:40am

Hey! Could you be more precise regarding your end goal? If you have hidden variables in your model, why wouldn’t you put them in your guide?

jpchen · July 31, 2019, 5:52pm

not exactly sure what you’re trying to do, but check out the autoguides library, they allow you to do things along the lines of what you’re asking, eg use an AutoDelta if you’re computing MAP estimates, etc.

megaloman · August 2, 2019, 4:16am

It’s a weird thing from a paper I’ve been working from (section 3.2.2). It’s super subtle, because if you look at the next section you’d think they were sampling from the variational distribution which is what I kind of expected.

I could presumably be wrong in this next point (which is good to have clarified!) but if you have a model like the below (\int = integral):

log p(y | x) = log \int p(y | x, z) p(z |x ) dx

you can pretend it’s an expectation, and apply Jensen’s inequality to it, and approximate the expectation of log(P (y | x, z) with samples from z ~ p(z |x) so it isn’t that weird of a design choice.

not sure how clear any of this is…