# How to estimate and include an unknown variable in a regression model?

I’m working on extending the following regression model to include an additional variable that is unknown. It is the mean of unknown scores of a given observation’s relationships. In other words, y = intercept + b_x1 *x1 + b_x2 * x2 + b_x3 * x3 + b_x4 * x4, where x4 is the mean(subset of unknown values).

``````# Pyro model (linear regression)

def model(x1, x2, x3, y=None):
"""

"""
# Coefficients
intercept = pyro.sample("intercept", dist.Normal(0., 1.0))
b_x1 = pyro.sample("b_x1", dist.Normal(0., 1.0))
b_x2 = pyro.sample("b_x2", dist.Normal(0., 1.0))
b_x3 = pyro.sample("b_x3", dist.Normal(0., 1.0))
sigma = pyro.sample("sigma", dist.Uniform(0., 1.0))

mean = \
intercept +\
b_x1 * x1 +\
b_x2 * x2 +\
b_x3 * x3

with pyro.plate("data", len(x1)):
return pyro.sample("y", dist.Normal(mean, sigma), obs=y)
``````

It’s not correctly coded, but I’d like to add something like the following in order to estimate the values for another variable, x4:

``````    # Unknown variable for each person to be estimated.
latent_scores = [pyro.sample("latent_score_{i}", dist.Normal(0., 1.0)) for i in range(x1)]

# For each person, calculate the unknown x4 value, where x4[i] is the mean
# of unknown values of a given person's connections (i.e. network edges).
x4[i] = (latent_scores + latent_scores) / 2

``````

Basically, I’m wanting to estimate the unknown values for each person as well as the model coefficients at the same time. I’m new to Pyro, so if there is a more appropriate modeling approach, please let me know!

This sounds like some sort of hierarchical model or mixed effect model. Could you provide a little more information:

• What’s the relationship between `data` and people?
• Where do edges and vertices and network models fit into your `model`?
• What are the the values of a person’s connection? Are they some of the `y`s? How do you know only some are observed?
1 Like

• What’s the relationship between data and people?

• Each row in the data corresponds to one person, and the columns are attributes.
• Where do edges and vertices and network models fit into your model?

• The vertices in the network are the people and the edges are the inputs to the average being calculated, i.e. `x4[i] = (latent_scores + latent_scores) / 2 `, where the indexes  and  are the given person’s (i.e. row) edges in the network.
• What are the the values of a person’s connection? Are they some of the `y` s? How do you know only some are observed?

• The values of a person’s connections are assumed to be normal distributions with mean 0 and sigma 1, and these are included in the regression model as an input to predict one of the attributes for the people.

If I need to clarify any of the details, please let me know!

Here is an updated example model to show what I am attempting to do with the edges.

``````
def model(person_idx, edges_df, x1, x2, x3, y=None):
"""
person_idx: person index (i.e. unique identifier for each person)
edges_df: Pandas dataframe with "source" and "target" columns. The values in
the "target" column are lists of edges, e.g. [2, 17, 39]. Edge values
are based on person_idx.
x1 - x3: numeric input variables
y: numeric target
"""
# Coefficients
intercept = pyro.sample("intercept", dist.Normal(0., 1.0))
b_x1 = pyro.sample("b_x1", dist.Normal(0., 1.0))
b_x2 = pyro.sample("b_x2", dist.Normal(0., 1.0))
b_x3 = pyro.sample("b_x3", dist.Normal(0., 1.0))
b_emls = pyro.sample("b_emls", dist.Normal(0., 1.0))
sigma = pyro.sample("sigma", dist.Uniform(0., 1.0))

# A unique latent score for each person
mu_ls = pyro.sample("mu_ls", dist.Normal(0.0, 1.0))
sigma_ls = pyro.sample("sigma_ls", dist.HalfNormal(1.0))
n_people = len(person_idx)
with pyro.plate("plate_ls", n_people):
latent_score = pyro.sample("latent_score", dist.Normal(mu_lis, sigma_lis))

# Mean latent score of edges
mean_latent_score = torch.empty_like(x1)
for p in person_idx:
# Returns list of edges for person p
p_edges = edges_df.loc[edges_df['source'] == p, 'target']
num_p_edges = len(p_edges)
if num_p_edges > 0:
p_edges_mean_latent_score = sum([latent_score[e] for e in p_edges]) / num_p_edges
edges_mean_latent_score[p] = p_edges_mean_latent_score
else:
# Set value to 0 (i.e. mean) for those with no edges
edges_mean_latent_score[p] = 0

mean = \
intercept +\
b_x1 * x1 +\
b_x2 * x2 +\
b_x3 * x3 +\
b_emls * edges_mean_latent_score

with pyro.plate("data", len(x1)):
return pyro.sample("y_score", dist.Normal(mean, sigma), obs=y)
``````