Including a stochastic node with fixed parameters in a model

I am fitting a tree growth model in two stages, first by fitting a model that predicts potential/maximum growth, then by fitting a model that adds additional covariates to estimate the constraints on potential growth. I accomplish the first model by using quantile regression (using the AsymmetricLaplaceQuantile for likelihood). The posterior for the 3 coefficients of the potential growth model, b_potential are well-described by a multivariate normal (MVN) distribution. I would like to incorporate this potential growth model into the potential*modifier model, but prevent MCMC or SVI from treating the parameters for the potential growth components as tunable. Is there a way to do this?

In contrast to the potential growth model, which I fit with an Asymmetric Laplace likelihood to model the 95th percentile of growth, I am fitting the potential*modifier model with a Laplace likelihood to predict the mean growth response.

If I simply include the location and covariance_matrix for the MVN describing the coefficients for estimating potential growth in the potential*modifier model as a latent MVN, when I fit the model using SVI or MCMC, it modifies the parameters of the potential growth MVN in way that changes them significantly from their values I initialize them with (which were retrieved by fitting the potential growth model).

I tried defining the mean and covariance matrix of the MVN as observed parameters so that they wouldn’t be modified or sampled during MCMC, and feed these into a latent MVN distribution, but every step of the MCMC is divergent. This is what I tried:

mvn_mu = numpyro.sample("mvn_mu", dist.Uniform(low=pot_loc, high=pot_loc), obs=pot_loc)
mvn_sigma = numpyro.sample("mvn_sigma", dist.Uniform(low=pot_cov, high=pot_cov), obs=pot_cov)
mvn = numpyro.sample("mvn", dist.MultivariateNormal(mvn_mu, mvn_sigma))

I’ve also tried using an MVN without registering it as a sample site (e.g., dist.MultivariateNormal(loc, cov).sample(key), but have been unable to figure out how to ensure the key changes each run. It seems like handler.seed will not inject a seed into an object like this that is not a Pyro primitive, and if I need to instantiate the seed in the model, it’s not clear how to ensure that it changes in subsequent iterations of SVI or MCMC since it always gets pinned again to the original seed and split the same way each iteration. I tried something like this:

class fixed_mvn():
    def __init__(self, seed, loc, cov):
        self.loc = loc
        self.cov = cov
        self.key = random.PRNGKey(seed)

    def sample(self, sample_shape=()):
        self.key, sample_key = random.split(key)
        return dist.MultivariateNormal(self.loc, self.cov).sample(sample_key, sample_shape=sample_shape)

and then in the model, use it like this:

def model(X, mvn_loc, mvn_cov, y=None):
    mvn = fixed_mvn(42, mvn_loc, mvn_cov)
    b_potential = mvn.sample()
    potential = b_potential[0] + b_potential[1]*X[1] + b_potential[2]*X[1]**2
    ...

The seed is always reset to 42 and then split the same way each time the model is run through successive iterations of MCMC or SVI, so I don’t get any variability in the mvn samples…

I think a generalization of this issue is defining known and irreducible noise in a model. I’m currently just pinning the values of b_potential to constants (e.g., b_potential = [1.5, 2.5, -0.1]), but would love to generate estimates for b_potential in a stochastic way with an mvn whose loc and covariance_matrix parameters are not modified as the MCMC or SVI runs. Any ideas?

I’m not sure if I understand what you want. For stochastic node with fixed parameters, you can use e.g.

numpyro.sample("x", dist.Normal(0, 1))

Here 0 and 1 are the fixed parameters of the stochastic node “x” following Normal distribution.

As I understand it, 0 and 1 in this example are your priors for the loc and scale parameters of the normal, and when SVI or MCMC run, the values generated for x in each step evolve to represent the posterior distribution of x… when you look at the posterior distribution of parameter x, you don’t get mean of exactly zero and sd of exactly 1, right?

Similarly, in my example above, if I setup the loc and covariance_matrix of the mvn as objects to be sampled, they do not stay fixed over time. The mcmc.summary() shows significantly different values for the mean of b_potential than were provided in the input to the model as pot_loc

I’m saying that I want to generate values from an mvn with exactly pot_loc and pot_cov parameterization at every step of the MCMC or SVI.

If the parameters change at each step, they are no longer fixed. I guess you need to use some specific MCMC kernel for it because HMC kernels assume we fix the potential function.

For SVI, you just need to change the arguments of the model/guide at each step

def model(variable_inputs):
     ...

svi.update(..., variable_inputs)

I don’t understand what’s the role of posterior here. Probably i miss some important point.

Yeah, I didn’t elaborate very much, but I’m combining two models. One model is potential growth, the model I want posterior inference on in this case is a potential*modifier model that includes additional parameters that constrain growth. I’m wanting the posterior to inform the values of the constraining parameters on potential growth and keep the parameters that estimate potential growth fixed. I didn’t include those constraining parameters in the explanation above for brevity, but the model looks close to the following.

log_growth_potential = b_pot[0] + b_pot[1]*jnp.log(X[0]) + b_pot[2]*X[0]**2
log_growth_modifiers = b_mod[0]*X[1] + b_mod[1]*X[2] ...
growth = jnp.exp(log_growth_potential + log_growth_modifiers)

The log transformation treats the parameters as multiplicative, so you end up with a potential_growth * modifiers_of_growth behavior. I also define all the growth modifier parameters to have negative coefficients (using a negative LogNormal distribution) and set up the data such that all the growth modifier covariates are >= 0. This way the exponentiation of log_growth_modifiers always returns a value between 0 and 1 that will be multiplied by potential growth.

As I understood MCMC and SVI, they evolve the distributions of numpyro.sample objects towards the posterior such that the original location and covariance matrix inputs (in the case of MVN) are no longer the the only parameters governing draws from the MVN. I’m not asking how I can change inputs at each step, but the opposite. I’m asking how can I ensure that the samples of b_pot are drawn from an MVN with exactly mvn_loc, mvn_cov parameterization at every step while MCMC or SVI evolves all the b_mod parameters towards their posterior distributions.

I ran across a similar problem in trying to create a mixed effects model by adding random effects to a linear fixed effects model that may or may not help illustrate the point. If I add random effects at the group level that I want to be centered at zero, I’d intuitively do something like this:

with numpyro.plate(num_groups):
    e_group = numpyro.sample('e_group', dist.Normal(0, 1))

or

e_group = numpyro.sample('e_group', dist.Normal(jnp.zeros((num_groups), jnp.ones(num_groups)))

which could then be incorporated to the fixed effects model like:

pred = b0 + b1*x[0] + b2*x[1] + e_group[group_idx] 
e_individ = dist.HalfCauchy(0, 1)
obs = numpyro.sample('obs', dist.Normal(pred, e_individ ), obs=y)

However, the random effects specified in this way usually come out in the posterior with an overall mean (across all random effects) that is not zero. A good illustration of this issue and some options to work around them are presented in this article. There doesn’t seem to be a way to enforce, “this is a parameter that should not be tuned, but I still want it to be used to generate stochastic samples”. These authors point out that ensuring random effects sum to zero can’t be enforced in the definition of the distribution, but that you can use a workaround like creating a deterministic parameter for the last group’s random effect such that it is defined as the negative sum of all the random effects from other groups.

It would be useful to simplify the problem and set aside notions like random effects, growth potential, modifiers, etc. As I understood, you want to draw samples from p(x,z|y) where y is observed. Here p(x,z|y) is factored into p(x|y)p(z|x,y) where the posterior p(x|y) is known. If so, HMCGibbs might be what you want.

1 Like