SVI with aggregated observations

I need to do SVI, as in the tutorial

but the data is aggregated. For clarity, let’s talk about the tutorial itself. The observations there are not aggregated:

# create some data with 6 observed heads and 4 observed tails
data = []
for _ in range(6):
for _ in range(4):

but clearly, such array of observations carries the same information as the aggregated version, which could come for instance in the following format:

data_frequencies = torch.tensor([4,6])
data_values = torch.tensor([0,1])

Furthermore, as is my case, sometimes the data comes aggregated.
Aggregated data can sometimes be disaggregated in a canonical way. Not so much with my data, since we weight past observations less than recent ones, so that instead of data_frequencies we talk about data_weights.

And even if it is possible to disaggregate the data, it seems that it would be more efficient to keep it aggregated and write the log posterior with the data aggregated.

Is there a way to pose this problem without disaggregating the data?

I made the following change to the model:

def model(agg_data):
    # define the hyperparameters that control the Beta prior
    alpha0 = torch.tensor(10.0)
    beta0 = torch.tensor(10.0)
    # sample f from the Beta prior
    f = pyro.sample("latent_fairness", dist.Beta(alpha0, beta0))
    # loop over the observed data
    counter = 0
    for row in agg_data:
        freq, value = row
        # observe datapoint i using the Bernoulli
        # likelihood Bernoulli(f)
        for _ in range(int(freq)):
            pyro.sample("obs_{}".format(counter), dist.Bernoulli(f), obs=value)
            counter += 1

and the corresponding to the training data:

data = torch.tensor([

I get the same solution. Any thoughts? Do you think this code can be improved?

please refer to this tutorial Tensor shapes in Pyro — Pyro Tutorials 1.9.0 documentation

1 Like