I’m trying to replicate an example from SAS in Python where I fit a distribution from summary statistics. The summary statistics available to me are the total count, min, max, std, p50, p75, p85, p95, p98, p99, and p99.9. The measurements are coming from a distributed network of machines and consist of either latency or size distributions. The goal is to re-construct the mixture from each machine, and then combine those distributions to estimate the distribution of the entire network and do this on a regular interval in a streaming fashion.

I’m looking through the documentation and get the general gist of mixture models, but the thing that I don’t understand is how to setup the initial parameters for each distribution, which one to use given the data available to me, or how to shift each distribution to the corresponding quantile to construct the overall distribution.

Hi @1011111011101111, you can indeed fit quantiles with PyTorch and or Pyro. To fit distributions that implement .cdf() you can minimize cdf loss as in your linked SAS example. This actually does not work for Gamma because PyTorch does not implement Gamma.cdf; some other distributions do.

Here is a direct translation of that SAS script to PyTorch, but it does not run with Gamma:

import torch
import torch.distributions as dist
from torch.optim import Adam
# initial guess for finding root
alpha = torch.tensor(1., requires_grad=True)
beta = torch.tensor(1., requires_grad=True)
p1=0.5; X1 = 4 # eqn for 1st quantile: F(X1; alpha, beta) = p1
p2=0.9; X2 = 8 # eqn for 2nd quantile: F(X2; alpha, beta) = p2
def loss_fn():
d = dist.Gamma(alpha, beta)
# FIXME Gamma.cdf() is undefined
loss = (p1 - d.cdf(X1)) ** 2 + (p2 - d.cdf(X2)) ** 2
return loss
optim = Adam([alpha, beta], lr=0.01)
for step in range(1000):
optim.zero_grad()
loss = loss_fn()
loss.backward()
optim.step()
print("alpha = {}".format(alpha.item()))
print("beta = {}".format(beta.item()))

You could try another distribution, e.g. LogNormal.

@fritzo I wanted to followup and say thank you for your example. It both helped unblock me and unlocked some understanding of the API, distributions and terminology. My cleaned up example is below.

import torch
import torch.distributions as dist
from torch.optim import Adam
from typing import List, Tuple
def find_cauchy_params(quantiles: List[Tuple[float, float]]):
alpha = torch.tensor(1.0, requires_grad=True)
beta = torch.tensor(1.0, requires_grad=True)
quantile_tensors = [
(quantile, torch.tensor(quantile_value))
for quantile, quantile_value in quantiles
]
def loss_fn():
loss = 0.0
d = dist.Cauchy(alpha, beta)
for quantile, quantile_value in quantile_tensors:
loss += (quantile - d.cdf(quantile_value)) ** 2
return loss
optim = Adam([alpha, beta], lr=0.01)
for step in range(1000):
optim.zero_grad()
loss = loss_fn()
print("loss", loss)
loss.backward()
optim.step()
print("alpha = {}".format(alpha.item()))
print("beta = {}".format(beta.item()))
find_cauchy_params(
[(0.5, 0.0), (0.75, 0.0), (0.95, 1.0), (0.98, 1.0), (0.99, 8.0), (0.999, 11.0)]
)

Truncated Output:

...
loss tensor(0.0317, grad_fn=<AddBackward0>)
loss tensor(0.0317, grad_fn=<AddBackward0>)
loss tensor(0.0317, grad_fn=<AddBackward0>)
loss tensor(0.0317, grad_fn=<AddBackward0>)
alpha = -0.04828706011176109
beta = 0.11657208949327469

I do have a followup question: assuming I learn a series of weighted distributions, one for each machine being monitored to create the ensemble mixture representing the population, does Pyro have an easy method to get the CDF for the mixture?