Custom Guide for Dependent Parameters

Motahareh · April 27, 2022, 11:48pm

Hello,

I want to define a custom guide (without EasyGuide) for my model.

Here is a brief description of my model:

I have two static parameters (“mu” and “phi”) which for now I assumed them to be unconstrained and Normal.
I have also latent variables of “h_1” to “h_T” following a Markov sequence.

def model(data):
    mu = pyro.sample('mu', dist.Normal(0.0, 0.1))
    phi = pyro.sample('phi', dist.Normal(0.0, 0.05))
    sigma = torch.tensor([0.05])

    N = 1
    with pyro.plate('Latent params', N) as n:
        h_t = pyro.sample('h_0', dist.Normal(mu, sigma/(torch.sqrt(1-phi)**2)))
        for t in range(1, T):
            h_t = pyro.sample(f"h_{t}", dist.Normal(mu+phi*(h_t-mu), sigma))
            y_t = pyro.sample(f"y_{t}", dist.Normal(0.0, torch.exp(h_t/2)), obs=data[t]) # observations

At previous step, I implemented the “AutoNormal” guide for my model and it worked fine. Now, I want to have the most nearest (possible) guide to the “AutoNormal” by a manual derivation. Following the tutorial, I wrote the below guide. But I receive an error for running the SVI.

I assume the error that I receive is directly related to the parameters definitions such as “h_loc = pyro.param(‘AutoNormal.locs.h_0’, mu_loc)” which I have a parameter which I would like to define its posterior according to one another parameter “mu_loc”. I do not know whether I am allowed to do so, or I should just put a constant tensor instead of “mu_loc”? This is also my concern for defining the guide for the loop section that I have in my model, which relates the mean of each step to the mean of previous step. Do I have the degree of freedom to initialize their locs and scales as a sequence with just initialization for the first of them, or I should put constant values for each of them?

I also tried the constant values for the initialization “h” s and it runs without error. But it has really unreasonable almost constant parameters along the convergence plots. I wonder if there is a methodology to set the locs and scales values in a reasonable bounds, and is this the procedure that the AutoNormal follows? (I read the source code, but confused there as I am not expert…)

def guide(data):
    mu_loc = pyro.param('AutoNormal.locs.mu', torch.zeros(1))
    mu_scale = pyro.param('AutoNormal.scales.mu', 0.1*torch.ones(1), constraint=constraints.positive)
    mu = pyro.sample('mu', dist.Normal(mu_loc, mu_scale))
    
    phi_loc = pyro.param('AutoNormal.locs.phi', torch.zeros(1))
    phi_scale = pyro.param('AutoNormal.scales.phi', 0.05*torch.ones(1), constraint=constraints.positive)
    phi = pyro.sample('phi', dist.Normal(phi_loc, phi_scale))
    
    sigma = torch.tensor([0.6])
    

    ## LATENTS ##
    h_loc = pyro.param('AutoNormal.locs.h_0', mu_loc)
    h_scale = pyro.param('AutoNormal.scales.h_0', sigma, constraint=constraints.positive)
    h_t = pyro.sample('h_0', dist.Normal(h_loc, h_scale))
    
    for t in range(1, samps):
        h_loc = pyro.param(f"AutoNormal.locs.h_{t}", mu_loc+phi_loc*(h_loc-mu_loc))
        h_scale = pyro.param(f"AutoNormal.scales.h_{t}", sigma, constraint=constraints.positive)
        h_t = pyro.sample(f"h_{t}", dist.Normal(h_loc, h_scale))

fritzo · April 28, 2022, 12:09pm

Hi @Motahareh, it looks like your guide is sharing more parameters than AutoNormal would share. I’d recommend creating big parameter tensors that each have a time dimension, then indexing into those parameters at each time step. Here’s a rough sketch:

def guide(data):
      ...
+     N = len(data)
+     sigma = torch.full((N,), 0.6)
  
      ## LATENTS ##
-     h_loc = pyro.param('AutoNormal.locs.h_0', mu_loc)
-     h_scale = pyro.param('AutoNormal.scales.h_0', sigma, constraint=constraints.positive)
-     h_t = pyro.sample('h_0', dist.Normal(h_loc, h_scale))
+     h_loc = pyro.param('AutoNormal.locs.h', mu_loc.expand((N,))
+     h_scale = pyro.param('AutoNormal.scales.h', sigma, constraint=constraints.positive)
+     h_t = pyro.sample('h_0', dist.Normal(h_loc[0], h_scale[0]))
      
      for t in range(1, N):
-         h_loc = pyro.param(f"AutoNormal.locs.h_{t}", mu_loc+phi_loc*(h_loc-mu_loc))
-         h_scale = pyro.param(f"AutoNormal.scales.h_{t}", sigma[t], constraint=constraints.positive)
+         h_t = pyro.sample(f"h_{t}", dist.Normal(h_loc[t], h_scale[t]))

Motahareh · April 28, 2022, 1:46pm

Thank you for the recommendation.

This change made me more confused than before, as in the “AutoNormal” guide and my previous custom guide, I had the parameters of sequential “h” (which made sense based on my desire of the parameter estimation):

dict_keys([‘AutoNormal.locs.mu’, ‘AutoNormal.scales.mu’, ‘AutoNormal.locs.phi’, ‘AutoNormal.scales.phi’, ‘AutoNormal.locs.h_0’, ‘AutoNormal.scales.h_0’, ‘AutoNormal.locs.y_0’, ‘AutoNormal.scales.y_0’, ‘AutoNormal.locs.h_1’, ‘AutoNormal.scales.h_1’, ‘AutoNormal.locs.h_2’, ‘AutoNormal.scales.h_2’, ‘AutoNormal.locs.h_3’, ‘AutoNormal.scales.h_3’, ‘AutoNormal.locs.h_4’, ‘AutoNormal.scales.h_4’, ‘AutoNormal.locs.h_5’, ‘AutoNormal.scales.h_5’, ‘AutoNormal.locs.h_6’, ‘AutoNormal.scales.h_6’, ‘AutoNormal.locs.h_7’, ‘AutoNormal.scales.h_7’, ‘AutoNormal.locs.h_8’, ‘AutoNormal.scales.h_8’, ‘AutoNormal.locs.h_9’, ‘AutoNormal.scales.h_9’, ‘AutoNormal.locs.h_10’, ‘AutoNormal.scales.h_10’, ‘AutoNormal.locs.h_11’, ‘AutoNormal.scales.h_11’, ‘AutoNormal.locs.h_12’, ‘AutoNormal.scales.h_12’, ‘AutoNormal.locs.h_13’, ‘AutoNormal.scales.h_13’, ‘AutoNormal.locs.h_14’, ‘AutoNormal.scales.h_14’, ‘AutoNormal.locs.h_15’, ‘AutoNormal.scales.h_15’, ‘AutoNormal.locs.h_16’, ‘AutoNormal.scales.h_16’, ‘AutoNormal.locs.h_17’, ‘AutoNormal.scales.h_17’, ‘AutoNormal.locs.h_18’, ‘AutoNormal.scales.h_18’, ‘AutoNormal.locs.h_19’, ‘AutoNormal.scales.h_19’])

However, based on this change, now I cannot have any estimation of the “h” sequence, and I have just accessibility to its latest one:

dict_keys([‘AutoNormal.locs.mu’, ‘AutoNormal.scales.mu’, ‘AutoNormal.locs.phi’, ‘AutoNormal.scales.phi’, ‘AutoNormal.locs.h’, ‘AutoNormal.scales.h’])

This is not equivalent to the “AutoNormal” guide and solves different problem, or I am missing something?

martinjankowiak · April 28, 2022, 2:58pm

i’d like to point out that your model doesn’t appear to make sense, since the prior allows phi to be negative but you compute square roots with phi.

fritzo · April 28, 2022, 3:15pm

I cannot have any estimation of the “h” sequence, and I have just accessibility to its latest one

In the vectorized model I suggested, the AutoNormal.scales.h parameter is a tensor. To access the h[t] value at a given time, you would index that tensor e.g.

pyro.get_param_store()[AutoNormal.scales.h][t]

Motahareh · April 28, 2022, 3:29pm

Completely right! That is my next problem. Actually I was first trying to reduce down my original problem a bit so as not to be involved in transforming the distributions. So, instead of putting phi in uniform(-1,1), I assigned a normal distribution to it (which also made this thing wrong). I actually do not know which function I should use for this transformation to the real coordination, and would be grateful if you could help with that problem too.

I found this recommendation based on another post, but I am not sure whether I can use it in the right context or not. Base on my understanding, the code perform two consecutive transformations, for bringing the parameter into the real coordination. I should check which combination makes the transformation between uniform and real. In addition to this change in the code, should I change any other thing in my guide, while I have a distribution which is not in the real coordination? Such as for reverting back the results in the constraint domain? Thank you very much for your help!

transforms = [AffineTransform(loc=phi_loc, scale=phi_scale),SigmoidTransform(),]
response_dist = dist.TransformedDistribution(dist.Uniform(-1.0, 1.0), transforms)
phi = pyro.sample('phi', response_dist)

Motahareh · April 28, 2022, 3:50pm

Now I can understand what is going on.
That would be much more clean and clear to be in the vectorized format. Thank you!

martinjankowiak · April 28, 2022, 7:02pm

i suggest you use simple pre-built distributions like LogNormal to enforce positivity unless there’s a particular reason you’d like to make another prior assumption