MLE w/ numpyro: how to deal with scale

adk · May 10, 2022, 7:45pm

When using numpyro for Bayesian inference, I understand the importance of scaling sample sites to match the step size of the optimizer. I have used the reparameterization utilities provided by numpyro for this, e.g.

...
with handlers.reparam(config={"some_large_sample":LocScaleReparam(centered=0.0)}):
            with pyro.plate("data",len(my_data)):
                some_large_sample = pyro.sample("some_large_sample",dist.Normal(1e9,1e7))
...

My question is, is there a similar reparameterization trick that works when the model itself contains parameters, as it does when performing MLE. For illustration purposes, here is complete code to illustrate the problem I’d like to solve:

import numpyro.distributions.constraints as constraints
import numpyro
import numpy as np
import numpyro.distributions as dist
from numpyro.infer import SVI, Trace_ELBO
from jax import random

data = np.random.normal(loc=1e9,scale=1e7,size=1000)
def model_mle(data):

    loc = numpyro.param("latent_loc", np.array(0.0))
    scale = numpyro.param("latent_scale", np.array(1.0))
    
    with numpyro.plate("data", len(data)):
        numpyro.sample("obs", dist.Normal(loc=loc,scale=scale), obs=data)

def guide_mle(data):
    pass

def train(model, guide, lr=0.1, n_steps=201):
    #pyro.clear_param_store()
    
    #adam_params = {"lr": lr}
    adam = numpyro.optim.Adam(step_size=lr)#adam_params)
    svi = SVI(model, guide, adam, loss=Trace_ELBO())
    rng_key = random.PRNGKey(0)
    svi_state = svi.init(rng_key,data)

    for step in range(n_steps):
        svi_state,loss = svi.update(svi_state,data)
        if step % 50 == 0:
            print('[iter {}]  loss: {:.4f}'.format(step, loss))
    return svi.get_params(svi_state)
params = train(model_mle, guide_mle)
params

when I run the above program, the latent_loc and latent_scale params don’t converge to the maximum likelihood values, because the optimizer step size is too small for such large-valued parameters. What I would like is a general-purpose solution, similar to LocScaleReparam in the code snippet above, that works when the distribution in the sample statement contains parameters. Is there a convenient way to deal with the scale issue for MLE, or is there no way around manually adjusting either the scale of the data or the optimizer step size in order to make this example work?

martinjankowiak · May 10, 2022, 9:00pm

scaling the data to be O(1) is probably the best solution since it makes it most likely that other parameter settings in sensible default ranges will work