Initializing params ahead of AutoGuide

I stepped away from Pyro for a few months and now I’m coming back and have to port a few things. Truth be told, this is really trouble I asked for as I had ended up going off in the weeds a bit with my approach. I’ll briefly describe:

I have a model built upon the HMM in the examples. One of the main RVs is the transition matrix between states. I wasn’t happy with the transitions I was seeing by using the Dirichlet prior of the examples, so this is where I went off on my own. My main concern was that I wanted to have more control over how sticky the states were. As an experiment, I built my own prior for the transition matrix where I could specify the mean of the diagonal and off-diagonal terms using a bunch of beta dist tensors, along with a simplex constraint along the rows. (Happy to post the code but omitting for brevity.) Now my problem became: How to make these priors replace the params made for the transition matrix by the AutoDelta guide? The approach I took was to declare the auto_transition parameter myself before calling the AutoDelta. I declared it as this matrix of betas, along with the simplex constraint. Then, when I did the AutoGuide, it wasn’t calling the init function because it was already there. (I had a hackish init function just for the transition param making sure that wouldn’t replace my baby. This hack no longer works, apparently it’s calling init even though I already declared the param, now named AutoGuideList.0.transition .)

Maybe it’s time to ask what the best way to do this is. I’m trying to ramp up on a lot of concepts here, so the best path could be any number of places. Some key questions are:

  • Is it possible to replace existing parameters in the param store? My first thought was to let AutoDelta make the auto_transition parameters and then replace them with my concocted prior matrix of betas and my simplex constraints. Is this even possible? And how would one do this given the whole constrained/unconstrained thing going on in the param store? (I never have figured out how to handle the constrained/unconstrained at a lower level.)
  • Now that the old prefix naming system is gone (the param now ends up being named AutoGuideList.0.transition), is it still possible to declare the param ahead of time with the constraints, such that AutoDelta doesn’t try to reinitialize it? I’m not having luck at this point using that long name.
  • More generally, is there a better way I’m missing to exert pressure on the stickiness of the states? I’ve tried ramping up the diagonal of the Dirichlet in the example but find it is too weak.
  • It’s also a very real possibility that I’m not using the guide/model setup correctly… maybe the guide params are the wrong place to put the priors and the simplex constraint?

Many thanks!

Hi @chchchain,

Initializing params … of AutoGuide

The recommended and easiest way to initialize parameters in an autoguide is to pass an init_loc_fn argument with either one of the pre-defined initializers or with a custom initializer, e.g. in your case you might use

def init_loc_fn(site):
    if site["name"] == "trans":
        # Initialize close to the identity matrix.
        dim = site["fn"].event_shape[-1]
        return 0.99 * torch.eye(dim) + 0.01 / dim
    # Fall back to a standard initializer.
    return init_to_sample(site)

guide = AutoDelta(model, init_loc_fn=init_loc_fn)

How to make these priors replace the params made for the transition matrix by the AutoDelta guide?

I’m not sure I understand here. If you’re using init_to_sample or init_to_median then these priors should be used to initialize AutoDelta parameters. They will always be used as priors in SVI.

Is it possible to replace existing parameters in the param store?

Yes, the ParamStore has a dict-like interface that respects constraints (automatically projects to the constrained subspace); you just need to make sure the store is initialized so it already knows about constraints. But this is seldom needed; the preferred way to initialize autoguides is with the init_loc_fn kwarg and initializers.

Now that the old prefix naming system is gone …, is it still possible to declare the param ahead of time …?

While that is still possible, it is no longer recommended, and init_loc_fn is much easier and more modular.

is there a better way I’m missing to exert pressure on the stickiness of the states?

Well one issue is that SVI will try to simultaneously learn the observation distribution and the latent states, so if your observation function starts out as garbage, the states may tend towards garbage early in training. You might try initializing the transition matrix to be very sticky (say 95% sticky), and give the observation distribution a higher learning rate:

def optim_config(module_name, param_name):
    if param_name = "obs_matrix":
        return {"lr": 0.02}
    elif param_name = "trans_matrix":
        return {"lr": 0.002}
    return {"lr": 0.01}  # default

optim = ClippedAdam(optim_config)

It’s also a very real possibility that I’m not using the guide/model setup correctly… maybe the guide params are the wrong place to put the priors and the simplex constraint?

The guide params are one of right places to put simplex constraints, but the wrong place to put priors. Again I’m confused. Maybe a code example would help.