Including a stochastic node with fixed parameters in a model

d-diaz · January 21, 2023, 8:07pm

I am fitting a tree growth model in two stages, first by fitting a model that predicts potential/maximum growth, then by fitting a model that adds additional covariates to estimate the constraints on potential growth. I accomplish the first model by using quantile regression (using the AsymmetricLaplaceQuantile for likelihood). The posterior for the 3 coefficients of the potential growth model, b_potential are well-described by a multivariate normal (MVN) distribution. I would like to incorporate this potential growth model into the potential*modifier model, but prevent MCMC or SVI from treating the parameters for the potential growth components as tunable. Is there a way to do this?

In contrast to the potential growth model, which I fit with an Asymmetric Laplace likelihood to model the 95th percentile of growth, I am fitting the potential*modifier model with a Laplace likelihood to predict the mean growth response.

If I simply include the location and covariance_matrix for the MVN describing the coefficients for estimating potential growth in the potential*modifier model as a latent MVN, when I fit the model using SVI or MCMC, it modifies the parameters of the potential growth MVN in way that changes them significantly from their values I initialize them with (which were retrieved by fitting the potential growth model).

I tried defining the mean and covariance matrix of the MVN as observed parameters so that they wouldn’t be modified or sampled during MCMC, and feed these into a latent MVN distribution, but every step of the MCMC is divergent. This is what I tried:

mvn_mu = numpyro.sample("mvn_mu", dist.Uniform(low=pot_loc, high=pot_loc), obs=pot_loc)
mvn_sigma = numpyro.sample("mvn_sigma", dist.Uniform(low=pot_cov, high=pot_cov), obs=pot_cov)
mvn = numpyro.sample("mvn", dist.MultivariateNormal(mvn_mu, mvn_sigma))

I’ve also tried using an MVN without registering it as a sample site (e.g., dist.MultivariateNormal(loc, cov).sample(key), but have been unable to figure out how to ensure the key changes each run. It seems like handler.seed will not inject a seed into an object like this that is not a Pyro primitive, and if I need to instantiate the seed in the model, it’s not clear how to ensure that it changes in subsequent iterations of SVI or MCMC since it always gets pinned again to the original seed and split the same way each iteration. I tried something like this:

class fixed_mvn():
    def __init__(self, seed, loc, cov):
        self.loc = loc
        self.cov = cov
        self.key = random.PRNGKey(seed)

    def sample(self, sample_shape=()):
        self.key, sample_key = random.split(key)
        return dist.MultivariateNormal(self.loc, self.cov).sample(sample_key, sample_shape=sample_shape)

and then in the model, use it like this:

def model(X, mvn_loc, mvn_cov, y=None):
    mvn = fixed_mvn(42, mvn_loc, mvn_cov)
    b_potential = mvn.sample()
    potential = b_potential[0] + b_potential[1]*X[1] + b_potential[2]*X[1]**2
    ...

The seed is always reset to 42 and then split the same way each time the model is run through successive iterations of MCMC or SVI, so I don’t get any variability in the mvn samples…

I think a generalization of this issue is defining known and irreducible noise in a model. I’m currently just pinning the values of b_potential to constants (e.g., b_potential = [1.5, 2.5, -0.1]), but would love to generate estimates for b_potential in a stochastic way with an mvn whose loc and covariance_matrix parameters are not modified as the MCMC or SVI runs. Any ideas?

fehiepsi · January 22, 2023, 8:42am

I’m not sure if I understand what you want. For stochastic node with fixed parameters, you can use e.g.

numpyro.sample("x", dist.Normal(0, 1))

Here 0 and 1 are the fixed parameters of the stochastic node “x” following Normal distribution.

d-diaz · January 22, 2023, 9:23am

As I understand it, 0 and 1 in this example are your priors for the loc and scale parameters of the normal, and when SVI or MCMC run, the values generated for x in each step evolve to represent the posterior distribution of x… when you look at the posterior distribution of parameter x, you don’t get mean of exactly zero and sd of exactly 1, right?

Similarly, in my example above, if I setup the loc and covariance_matrix of the mvn as objects to be sampled, they do not stay fixed over time. The mcmc.summary() shows significantly different values for the mean of b_potential than were provided in the input to the model as pot_loc…

d-diaz · January 22, 2023, 9:26am

I’m saying that I want to generate values from an mvn with exactly pot_loc and pot_cov parameterization at every step of the MCMC or SVI.

fehiepsi · January 22, 2023, 1:20pm

If the parameters change at each step, they are no longer fixed. I guess you need to use some specific MCMC kernel for it because HMC kernels assume we fix the potential function.

For SVI, you just need to change the arguments of the model/guide at each step

def model(variable_inputs):
     ...

svi.update(..., variable_inputs)

I don’t understand what’s the role of posterior here. Probably i miss some important point.

d-diaz · January 22, 2023, 7:57pm

Yeah, I didn’t elaborate very much, but I’m combining two models. One model is potential growth, the model I want posterior inference on in this case is a potential*modifier model that includes additional parameters that constrain growth. I’m wanting the posterior to inform the values of the constraining parameters on potential growth and keep the parameters that estimate potential growth fixed. I didn’t include those constraining parameters in the explanation above for brevity, but the model looks close to the following.

log_growth_potential = b_pot[0] + b_pot[1]*jnp.log(X[0]) + b_pot[2]*X[0]**2
log_growth_modifiers = b_mod[0]*X[1] + b_mod[1]*X[2] ...
growth = jnp.exp(log_growth_potential + log_growth_modifiers)

The log transformation treats the parameters as multiplicative, so you end up with a potential_growth * modifiers_of_growth behavior. I also define all the growth modifier parameters to have negative coefficients (using a negative LogNormal distribution) and set up the data such that all the growth modifier covariates are >= 0. This way the exponentiation of log_growth_modifiers always returns a value between 0 and 1 that will be multiplied by potential growth.

d-diaz · January 22, 2023, 8:11pm

As I understood MCMC and SVI, they evolve the distributions of numpyro.sample objects towards the posterior such that the original location and covariance matrix inputs (in the case of MVN) are no longer the the only parameters governing draws from the MVN. I’m not asking how I can change inputs at each step, but the opposite. I’m asking how can I ensure that the samples of b_pot are drawn from an MVN with exactly mvn_loc, mvn_cov parameterization at every step while MCMC or SVI evolves all the b_mod parameters towards their posterior distributions.

d-diaz · January 22, 2023, 8:27pm

I ran across a similar problem in trying to create a mixed effects model by adding random effects to a linear fixed effects model that may or may not help illustrate the point. If I add random effects at the group level that I want to be centered at zero, I’d intuitively do something like this:

with numpyro.plate(num_groups):
    e_group = numpyro.sample('e_group', dist.Normal(0, 1))

or

e_group = numpyro.sample('e_group', dist.Normal(jnp.zeros((num_groups), jnp.ones(num_groups)))

which could then be incorporated to the fixed effects model like:

pred = b0 + b1*x[0] + b2*x[1] + e_group[group_idx] 
e_individ = dist.HalfCauchy(0, 1)
obs = numpyro.sample('obs', dist.Normal(pred, e_individ ), obs=y)

However, the random effects specified in this way usually come out in the posterior with an overall mean (across all random effects) that is not zero. A good illustration of this issue and some options to work around them are presented in this article. There doesn’t seem to be a way to enforce, “this is a parameter that should not be tuned, but I still want it to be used to generate stochastic samples”. These authors point out that ensuring random effects sum to zero can’t be enforced in the definition of the distribution, but that you can use a workaround like creating a deterministic parameter for the last group’s random effect such that it is defined as the negative sum of all the random effects from other groups.

fehiepsi · January 23, 2023, 2:59am

It would be useful to simplify the problem and set aside notions like random effects, growth potential, modifiers, etc. As I understood, you want to draw samples from p(x,z|y) where y is observed. Here p(x,z|y) is factored into p(x|y)p(z|x,y) where the posterior p(x|y) is known. If so, HMCGibbs might be what you want.

anmunozj · March 12, 2024, 12:33am

@d-diaz If I understand correctly I want to do the same as you. Where you ever able to make it work?

In my case I want to add random noise to observations in a model and make the statistical properties of the noise be something that isn’t part of the posterior calculation, but stays constant during MCMC.

anmunozj · March 12, 2024, 5:20pm

Ok this is really long after the original question, but maybe you can help answer one related question @fehiepsi ? Could this be accomplished with handler.block()?

Following the original model could something like this work?

key = random.PRNGKey(0)
with handlers.block(), handlers.seed(rng_seed=key):
    mvn = numpyro.sample("hidden_mvn", dist.MultivariateNormal(mvn_mu, mvn_sigma))

fehiepsi · March 12, 2024, 9:29pm

You can use HMCGibbs for that purpose. In each step, you can simple draw a random mvn sample from the gibbs function.

anmunozj · March 12, 2024, 9:48pm

Thank you! How about if I want to use nested sampling? I really need something that can calculate evidence to do model comparison and I have struggled massively with calculating it on my own after using HMC. Could this approach that uses handler.block work?

fehiepsi · March 12, 2024, 10:59pm

I think you can simply call MVN(...).sample(...) instead of using block handler. But I’m not sure what you want to achieve.

anmunozj · March 13, 2024, 12:33am

I want to add stochastic noise to my observations, without it being use as a site that affects the posterior calculation. I’m not sure I have the right terminology to express myself, but this is unavoidable noise, and its distribution properties should stay constant during the entire Nested Sampling calculation.

Does this make sense? In other words. Each of my observations represents the mean of a distribution and each observation also has a different standard deviation that I know and I don’t want MCMC to change as I run it. Ideally they would just add uncertainty to the parameters that I do want to fit.

fehiepsi · March 13, 2024, 12:53am

Could you write a prototype code for your generative model and what you want to infer from it? My interpretation is that you want to make an inference on something like

def model(mean, std):
    x = sample("x", dist.Normal())
    y = sample("y", dist.Normal(x, std), obs=mean)

anmunozj · March 13, 2024, 4:38am

Yes of course! thank you so much for your help! It looks like this (comments explain the different parts)

def single_hem_single_fit_model(
    self,
    ref_lat: jnp.array,
    ref_time: jnp.array,
    ref_area: jnp.array,
    fit_lat: jnp.array,
    fit_time: jnp.array,
):
    """
    Numpyro model to fit a hemispheric cycle against another hemispheric cycle

    Parameters
    ----------
    ref_lat : jnp.array
        Reference latitude
    ref_time : jnp.array
        Reference time
    ref_area : jnp.array
        Reference area
    fit_lat : jnp.array
        Fit latitude
    fit_time : jnp.array
        Fit time
    """

    # Calculate the uncertainty associated with each data point
    # --------------------------------------------------------------------------------------------------
    ref_area_unc = jnp.abs(
        ref_area - jnp.pi * jnp.power(jnp.sqrt(ref_area / jnp.pi) + 0.5, 2) + 1
    )
    ref_area_unc = jnp.where(ref_area == self.fill_value, 1, ref_area_unc)
    ref_lat = jnp.where(ref_area < ref_area_unc, self.fill_value, ref_lat)
    ref_area = jnp.where(ref_area < ref_area_unc, self.fill_value, ref_area)

    # Define a time offset between solar cycles.  This is one of the quantities I want to learn
    offst = numpyro.sample(
        "offst", dist.Uniform(-self.offst_constrn, self.offst_constrn)
    )

    # Define an observational threshold.   This is the other quantity I want to learn    
    threshold = numpyro.sample("threshld", dist.Uniform(0, self.thres_constrn))

    # Define a gaussian random noise site for each data point independently.   
    # I don't want to ever change the statisitcal properties of these sites, 
    # nor I want them to feature in the posterior, except as an increased uncertainty
    key = random.PRNGKey(0)
    with handlers.block(), handlers.seed(rng_seed=key):
        with numpyro.plate("data", ref_area.shape[0]):
            noise = numpyro.sample("noise", dist.Normal(0, self.area_error))

    # Add the noise to my area.  This will be compared agains the "Threshold" sample site
    # I want my data to randomly have more or less area as Nested Sampling tries to integrate, 
    # adding uncertainty to the estimation of "threshold"
    ref_area_noise = ref_area + ref_area_unc * noise

    # Apply observational imprints (time and area threshold) and calculate variables and probabilities
    # This is a function that calculates log_prob to be added using factor below
    # --------------------------------------------------------------------------------------------------
    ref_fit_log_prob = self.calculate_empirical_log_prob(
        ref_lat, ref_time, ref_area_noise, fit_lat, fit_time, offst, threshold
    )

    numpyro.factor(f"fit_to_ref_bfly", ref_fit_log_prob)

fehiepsi · March 13, 2024, 12:04pm

Could you remove unnecessary computation and simplify it such that it only involves 1 or 2 variables?

anmunozj · March 13, 2024, 3:15pm

Yes! thank you very much for your patience. How about this?

Suppose you want to measure approximate the distribution of heights in a population but you have a faulty measuring device whose error scales with the measurement. You know this for an absolute fact. You cannot make your measurement better and you know exactly what percentage of error you have. Would this model work?

def model(height_measurements, percentage_error):
    # I know height must be between 100 and 200cm
    height_mu = numpyro.sample("height_mu", dist.Uniform(100, 200))
    # I know standard deviation should be between 0 and 100cm
    height_sigma = numpyro.sample("height_sigma", dist.Uniform(0, 100))

    # I add the error I know each measurement has and I want it to vary 
    # but I don't want it to be modified as part of the posterior
    key = random.PRNGKey(0)
    with handlers.block(), handlers.seed(rng_seed=key):
        with numpyro.plate("data", height_measurements.shape[0]):
            noise = numpyro.sample("noise", dist.Normal(0, 1))
    
    # I add the random realization of noise to my data
    height_measurements_with_noise = height_measurements*(1+percentage_error*noise)

    # I define my observed distribution
    numpyro.sample("height_distribution", dist.Normal(height_mu, height_sigma), obs=height_measurements_with_noise)

If I understand block correctly, this would add noise to the nested sampling run without making it to the posterior. My expectation is that the larger the percentage_error the broader will my posterior distribution be, but I know what this distribution is and I don’t want it to ever change during the process. Does this make more sense?

Following your suggestion, would this work/be better?

def model(height_measurements, percentage_error):
    # I know height must be between 100 and 200cm
    height_mu = numpyro.sample("height_mu", dist.Uniform(100, 200))
    # I know standard deviation should be between 0 and 100cm
    height_sigma = numpyro.sample("height_sigma", dist.Uniform(0, 100))

    # I add the error I know each measurement has and I want it to vary 
    # but aI don't want it to be modified as part of the posterior
    noise = dist.Normal(0, 1).expand(height_measurements.shape).sample()
    
    # I add the random realization of noise to my data
    height_measurements_with_noise = height_measurements*(1+percentage_error*noise)

    # I define my observed distribution
    numpyro.sample("height_distribution", dist.Normal(height_mu, height_sigma), obs=height_measurements_with_noise)

Once againg, thank you so much for your kindness and patience!

fehiepsi · March 13, 2024, 10:09pm

I think you can provide numpyro.prng_key() to ....sample() call. No need to use block. Such random noise will be different every time the model is run under a new seed. However, I suspect that nested sampling only supports static (not stochastic) potential function so such inference algorithm might not work. Similarly, most mcmc algorithms in numpyro only support static potential functions.

In your model, you can marginalize out the noise with some algebra math. I think it’s a better solution here.