Custom distribution for SVI and sampling

physicswizard · October 22, 2020, 5:38pm

I am trying to model some events that are asymptotically Poisson/Laplace distributed (i.e. the pmf/pdf has an exponential tail), but for short time periods has an interesting weekly periodic structure:

I have had difficulty modeling this using standard distributions because of its unique shape, and so I was considering implementing a custom distribution that could capture these peaks on short time scales, while also fitting the long tail.

I have a number of different ideas for how to formulate a pdf/pmf that might do the trick (exponential baseline with finite number of Gaussian bumps, or categorical distribution below some threshold and Poisson above it, etc.), and they are all relatively easy to implement log_prob for, but then I realized that I might also need to implement a sampler, which for a distribution of this complexity is probably very involved and nontrivial! I know that for HMC the sample method for a Distribution is only called to initialize the first point so it is not needed for inference, but I want to use SVI because my dataset has millions of examples and will be very slow otherwise.

So my question is: do I need to implement a sampler to do inference with SVI? On the face of things I would expect the answer to be no because this site will be observed, so I think I would only need to implement log_prob, but I’m not 100% sure. Also, does the log_prob need to be normalized, or can I used an unnormalized distribution?

Bonus question: is there any way I can use standard torch/pyro distributions to fit this distribution? Perhaps some kind of mixture model?

martinjankowiak · October 22, 2020, 6:13pm

if you’re using a distribution in SVI on the model side all that’s really required is the log_prob since all samples come from the guide. the log_prob doesn’t need to be normalized but it does need to reflect any dependence on the parameters in the log normalizer. e.g. the two pi in the log_prob of a normal distribution doesn’t matter for the purposes of ELBO training but the log scale certainly does.

physicswizard · October 22, 2020, 6:33pm

Thanks for the quick response! So if I understand correctly, if I wanted to use this distribution in the guide, I would need to implement sample, however since it is an observed site it doesn’t need to be present in the guide, and therefore sample is not needed, only log_prob. And that makes sense about the normalization, thank you. I’m pretty sure my normalization “constant” will end up having parameters in it.

fritzo · October 23, 2020, 1:39pm

Hi @physicswizard, one way you could model this distribution is with a more complex probabilistic model. Here’s an example that splits days into week and day-of-week components.

def model(days):
    # Split data into weekly and day-of-week.
    assert days.dtype == torch.float
    weeks = days // 7
    day_of_week = days % 7

    # Model these two as a conditional distribution.
    prob = pyro.param("prob", torch.tensor(1.),
                      constraint=constraints.unit_interval)
    pyro.sample("weeks", dist.Geometric(prob))  # or maybe something richer

    probs0 = pyro.param("probs0", torch.ones(7) / 7,
                        constraint=constraints.simplex)
    probs1 = pyro.param("probs1", torch.ones(7) / 7,
                        constraint=constraints.simplex)
    timescale = pyro.param("timescale", torch.tensor(7.),
                           constraint=constraints.positive)
    probs = probs0 + (probs1 - probs0) * (days / timescale).neg().exp()
    pyro.sample("day_of_week", dist.Categorical(probs),
                obs=day_of_week)

physicswizard · October 23, 2020, 2:14pm

@fritzo that is a great idea. I have found in my discussions with others that these events do tend to occur on specific days of the week for different subgroups, so this sounds like a good fit.