Inference from weighted data (i.e. a coreset)

I am interested in estimating the posterior of a model from a weighted data set.

Every data point has an associated weight. For a data point with weight W, I want the inference algorithm to update the posterior as if there were W copies of that data point in the observed data set (even if W is not an integer).

Is there a way to weight data points during inference in Pyro, and if not, how difficult would it be to extend Pyro to do this?

I believe this functionality should be a priority for Pyro going forward, because weighted coreset methods allow extending Bayesian approaches to much larger datasets. See: [1710.05053] Automated Scalable Bayesian Inference via Hilbert Coresets

1 Like

You should be able to use pyro.poutine.scale, which multiplies log-probabilities by a constant, for that:

def model(data):
    ...
    latent = pyro.sample("latent", ...)
    ...
    with pyro.plate("data", N), pyro.poutine.scale(scale=weights_tensor):
        ...
        pyro.sample("observed", ..., obs=data)
    ...
1 Like

Is there currently a way (or workaround) to do sample-wise weighting with TraceEnum_ELBO for discrete latent variables?

Hi @dschneider, I’m not sure what you mean exactly by “sample-wise weighting” but the code above should work correctly for both discrete and continuous random variables and be compatible with TraceEnum_ELBO.

Hi, sorry for being unspecific. Using poutine.scale(scale=weights) with weights being a tensor with a different weight for each sample in the batch in conjunction with TraceEnum_ELBO currently (v1.8.1) gives

ValueError: enumeration only supports scalar poutine.scale

I found a similar question here:
https://github.com/pyro-ppl/pyro/issues/1897
Is there another way to achieve the same functionality?

Here some short example of my motivation:
Lets say I have n uncertain observations, e.g. with different discrete distributions P(X_i=a) = p_i, P(X_i=b) = 1 - p_i for each i < n. How to use these observations for model (Bayesian network) training? In my understanding I could sample k data points from each of those distributions, which would bloat the dataset to size n * k and only be sufficiently accurate for 1 << k. Or I could be exact and split each sample i into two samples (X_i1 = a with weight p_i and X_i2 = b with weight 1 - p_i) bloating the dataset size only to 2n in this case. Or is there a better way to use discrete distributions as observations?

Yes, you’re right, I’d forgotten we hadn’t implemented support for that in TraceEnum_ELBO, though it is mathematically valid.

In your particular case, it sounds like what you actually want is something like the following, which is compatible with enumeration:

xi_dist = ...  # generative model's conditional distribution of X_i
xi_obs_is_a = pyro.sample("xi_obs_is_a", Bernoulli(p_i)) == 1
pyro.sample("xi", xi_dist, obs=torch.where(xi_obs_is_a, a, b))

See also this old issue for further discussion of distribution-valued observations.

1 Like