Fit a distribution from quantiles

I’m trying to replicate an example from SAS in Python where I fit a distribution from summary statistics. The summary statistics available to me are the total count, min, max, std, p50, p75, p85, p95, p98, p99, and p99.9. The measurements are coming from a distributed network of machines and consist of either latency or size distributions. The goal is to re-construct the mixture from each machine, and then combine those distributions to estimate the distribution of the entire network and do this on a regular interval in a streaming fashion.

I’m looking through the documentation and get the general gist of mixture models, but the thing that I don’t understand is how to setup the initial parameters for each distribution, which one to use given the data available to me, or how to shift each distribution to the corresponding quantile to construct the overall distribution.

Is this possible given Pyro?

Hi @1011111011101111, you can indeed fit quantiles with PyTorch and or Pyro. To fit distributions that implement `.cdf()` you can minimize cdf loss as in your linked SAS example. This actually does not work for `Gamma` because PyTorch does not implement `Gamma.cdf`; some other distributions do.

Here is a direct translation of that SAS script to PyTorch, but it does not run with `Gamma`:

``````import torch
import torch.distributions as dist

# initial guess for finding root

p1=0.5;  X1 = 4  # eqn for 1st quantile: F(X1; alpha, beta) = p1
p2=0.9; X2 =  8  # eqn for 2nd quantile: F(X2; alpha, beta) = p2

def loss_fn():
d = dist.Gamma(alpha, beta)
# FIXME Gamma.cdf() is undefined
loss = (p1 - d.cdf(X1)) ** 2 + (p2 - d.cdf(X2)) ** 2
return loss

for step in range(1000):
loss = loss_fn()
loss.backward()
optim.step()
print("alpha = {}".format(alpha.item()))
print("beta = {}".format(beta.item()))
``````

You could try another distribution, e.g. `LogNormal`.

@fritzo I wanted to followup and say thank you for your example. It both helped unblock me and unlocked some understanding of the API, distributions and terminology. My cleaned up example is below.

``````import torch
import torch.distributions as dist

from typing import List, Tuple

def find_cauchy_params(quantiles: List[Tuple[float, float]]):

quantile_tensors = [
(quantile, torch.tensor(quantile_value))
for quantile, quantile_value in quantiles
]

def loss_fn():
loss = 0.0
d = dist.Cauchy(alpha, beta)
for quantile, quantile_value in quantile_tensors:
loss += (quantile - d.cdf(quantile_value)) ** 2

return loss

for step in range(1000):
loss = loss_fn()
print("loss", loss)
loss.backward()
optim.step()

print("alpha = {}".format(alpha.item()))
print("beta = {}".format(beta.item()))

find_cauchy_params(
[(0.5, 0.0), (0.75, 0.0), (0.95, 1.0), (0.98, 1.0), (0.99, 8.0), (0.999, 11.0)]
)
``````

Truncated Output:

``````...