Why is there a discrepancy between the inferred results and the expected outcomes?

wat3rmelon · December 3, 2024, 3:37pm

Hello, I’m new to Bayesian inference. After briefly looking at the page Bayesian Regression Using NumPyro — NumPyro documentation, I tried to conduct Bayesian inference using NumPyro. However, the inference results are different from my expected results. Could you please tell me why such differences occur?

Here is my model code

def model(mask_matrix,obs):
    m,n = mask_matrix.shape[0],mask_matrix.shape[1]

    with numpyro.plate("Z", n):
        z = numpyro.sample("z", dist.Uniform(0.0, 1.0))
    
    mu = jnp.array([1.0]*m)
    # Here is the iterative method; is there an efficient vectorized form available?
    for i in range(m):
        for j in range(n):
            if mask_matrix[i][j] == 1:
                mu = mu.at[i].set(mu[i]*z[j])
    numpyro.sample("obs", dist.Normal(mu, 1) ,obs=obs)

In my model, the probability of each data point $ y_j given the set of random variables Z is calculated as the product of the elements in Z that are involved in the operation where N_j is a set indicating which random variables in Z are involved in the computation for y_j $.

P(y_j |Z) = \prod_{i \in N_j } z_i

To test my model, I generated some synthetic data and performed inference using MCMC, but the results did not match my expectations. Why might this be the case?

Here is how i generated synthetic data:

import itertools

total_cnt = 5
#For the sake of simplicity, let's assume that the first i numbers in total_cnt are 0.
expected_0_cnt = 1
expected_1_cnt = total_cnt - expected_0_cnt
expected_result = [0.0] * expected_0_cnt + [1.0] * expected_1_cnt

idx_list = [i for i in range(total_cnt)]
all_idx_commbinations = []

# all_Permutation
for i in range(1,total_cnt+1):
    all_idx_commbinations.append(list(itertools.combinations(idx_list,i))) 

mask_matrix = []
obs = []
for combinations in all_idx_commbinations:
    for combination in combinations:
        # False for not choose, True for choose
        tmp = [False for _ in range(total_cnt)]
        product = 1.0     
        for idx in combination:
            tmp[idx] = True
            product *= expected_result[idx]
        mask_matrix.append(tmp)
        obs.append(product)

mask_matrix = pd.DataFrame(mask_matrix)
obs = pd.DataFrame(obs)

and the inference result is

The output I expected is Z=(0,0,1,1,1) but the inference result is Z = (0.75,0.75,0.75,0.75,0.75) .

Thank you in advance for your guidance

fehiepsi · December 5, 2024, 6:11pm

Could you clarify your settings? For example, why the expected output is (0, 0, 1, 1, 1)? What is your synthesis data, in particular obs, mask_matrix? Did you check the likelihood of your expected output and the likelihood of mcmc samples?

wat3rmelon · December 6, 2024, 1:28pm

yes,here is my settings.

First, I generate my synthesis data. I set Z_{exp} = (0,0,1,1,1) .

Second,I generate my mask_matrix as follow:

mask\_matrix = \begin{bmatrix} 1 & 0 & 0 & 0 & 0 \\ 0 & 1 & 0 & 0 & 0 \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ 1 & 1 & 0 & 0 & 0 \\ 1 & 0 & 1 & 0 & 0 \\ \cdots & \cdots & \cdots & \cdots & \cdots \\ 1 & 1 & 1 & 1 & 1 \end{bmatrix}

For each row in the matrix, I will calculate in the following manner to obtain ‘obs’:

obs[i] = \prod \limits_{j \in N_i}^5 z[j] ,N_i = \{j | mask\_matrix[i][j] = 1\}

For example, obs[0] = z[0] = 0 , obs[1] = z[1] = 0 , obs[2] = z[3] = 1
And if mask\_matrix[i] = [1,0,1,0,0] , then obs[i] will be z[0]*z[2] = 0*1=0

Finally, I will add mask_matrix and obs as parameters to my model function, where I have defined Z to follow a uniform distribution.

So in my view, the output should exhibit the following trend: where
\text{the} \ \text{mean} \ \text{of} \{z[0],z[1]\} \rightarrow 0 and \text{the} \ \text{mean} \ \text{of} \{z[2],z[3],z[4]\} \rightarrow 1

fehiepsi · December 6, 2024, 1:59pm

I don’t fully understand the reason behind your expectation but I did a quick check and found likelihood of your expected value was smaller

    # added to the model
    z1 = jnp.array([0, 0, 1, 1, 1])
    mu1 = jnp.prod(jnp.where(mask_matrix == 1, z1, 1), axis=1)
    numpyro.deterministic("loglik", dist.Normal(mu, 1).log_prob(obs).sum())
    numpyro.deterministic("loglik1", dist.Normal(mu1, 1).log_prob(obs).sum())
...
mcmc.print_summary(exclude_deterministic=False)

gave me

                mean       std    median      5.0%     95.0%     n_eff     r_hat
    loglik  -1015.75      1.67  -1015.41  -1017.93  -1013.56    157.27      1.02
   loglik1  -1119.10      0.00  -1119.10  -1119.10  -1119.10      1.00       nan
      z[0]      0.75      0.08      0.75      0.63      0.89    610.66      1.00
      z[1]      0.75      0.08      0.75      0.62      0.88    236.18      1.00
      z[2]      0.75      0.08      0.75      0.62      0.88    306.03      1.00
      z[3]      0.76      0.09      0.76      0.60      0.89    380.01      1.00
      z[4]      0.75      0.08      0.74      0.64      0.90    477.93      1.01

wat3rmelon · December 7, 2024, 8:29am

Thank you very much for your help. I realized my mistake; I got my likelihood function wrong. After the following modifications, my model successfully produced the results I wanted.

def model(mask_matrix,obs):
    m,n = mask_matrix.shape[0],mask_matrix.shape[1]

    with numpyro.plate("Z", n):
        z = numpyro.sample("z", dist.Uniform(0.0, 1.0))
    
    mu = jnp.array([1.0]*m)
    # Here is the iterative method; is there an efficient vectorized form available?
    for i in range(m):
        for j in range(n):
            if mask_matrix[i][j] == 1:
                mu = mu.at[i].set(mu[i]*z[j])
        if obs[i] == 1:
            mu = mu.at[i].set(1 - mu[i])

    numpyro.sample("obs", dist.Normal(mu, 1) ,obs=obs)

Under the correct circumstances, my likelihood function should be as follows:
P(y_i= 0|z) = \prod \limits_{j \in N_i}^5 z[j] ,N_i = \{j | mask\_matrix[i][j] = 1\}
P(y_i= 1|z) = 1-\prod \limits_{j \in N_i}^5 z[j] ,N_i = \{j | mask\_matrix[i][j] = 1\}
and it gave me

sample: 100%|██████████| 3000/3000 [00:03<00:00, 955.12it/s, 7 steps of size 4.62e-01. acc. prob=0.89]  

                mean       std    median      5.0%     95.0%     n_eff     r_hat
      z[0]      0.33      0.08      0.33      0.20      0.45   1497.29      1.00
      z[1]      0.33      0.07      0.33      0.21      0.45   1460.37      1.00
      z[2]      0.85      0.09      0.86      0.73      0.99   1480.85      1.00
      z[3]      0.86      0.08      0.87      0.74      1.00   1135.68      1.00
      z[4]      0.85      0.09      0.86      0.73      1.00   1887.58      1.00

Number of divergences: 0