Why does my inferenced probability larger than 1 in categorical distribution?

I am creating my first pyro program.
The task is I have a couple of categorical data and want to estimate the distribution on them.
There are four categories A, B, C, D. I assume the categorical distribution with the prior distribution:
P(A) = 0.25, P(B) =0.25, P(C ) = 0.25, P(D) = 0.25.

I define the model as:

    def data_model(prior_prob):
        obs = np.array([0 if random.random() >0.2 else 1 for i in range(1000)]) # This is the synthesized data for test purpose
        obs = torch.from_numpy(obs)
        data = pyro.sample("data",dist.Categorical(probs=prior_prob),obs=obs)
        return data

I define the guide function:

def param_guide(prior_prob):
    posterior = pyro.param("probs", torch.tensor(prior_prob),constraint=constraints.positive)
    data = pyro.sample("data", dist.Categorical(posterior))
    return data

And I use SVI to inference the posterior probability distribution:

    prior_prob = np.array([0.25, 0.25,0.25,0.25])
    prior_prob = torch.from_numpy(prior_prob) 

svi = pyro.infer.SVI(model=data_model,
                     optim=pyro.optim.SGD({"lr": 0.001, "momentum":0.1}),

losses, probs  = [], []
num_steps = 2500
 for t in range(num_steps):

Finally, I found my probs = [7.5012e-03, 3.6106e-02, 1.0009e-02, 1.4410e+03]
Two problems here:

  1. The sum of the probabilities is larger than 1.
  2. I generate the data by the probabilities [0.8, 0.2, 0, 0], why do not the estimated value approach it?

there are many issues here. i suggest you read more of the tutorials, especially svi part i and parts ii and iii.

for example (there may be other issues):

  • you can’t put numpy generated randomness in your model like this. you’re going to resample the data every time the model is run. instead generate your data once, outside the model.

obs = np.array([0 if random.random() >0.2 else 1 for i in range(1000)]) # This is the synthesized data for test purpose

  • your probs need a simplex constraint not a positive constraint

  • you probably don’t want to use SGD for SVI. try Adam

Thank you for your reply.
I saw an example of feeding multiple observations in svi part i:

 for i in range(len(data)):
    # observe datapoint i using the bernoulli
    # likelihood Bernoulli(f)
    pyro.sample("obs_{}".format(i), dist.Bernoulli(f), obs=data[i])

What’s the difference between using a for loop and feeding observations one time as a vector:?

pyro.sample("obs", dist.Bernoulli(f), obs=data)

the two are equivalent but the vectorized version will generally be (much) faster. the for loop construct is provided for extra flexibility (e.g. for use if something isn’t easily vectorizable)