I’m trying to directly implement LDA, not worried about performance at this point. With some extra print
s for debugging, my model
is
def model(data):
α = pyro.param("α", t.tensor(0.1),constraint=constraints.positive)
β = [pyro.param(f"β_{z}",t.ones(V)/V, constraint=constraints.simplex) for z in range(K)]
print(f"α={α}")
for d in pyro.irange("documents", D):
print(f"d={d}")
θ = pyro.sample(f"θ_{d}", dist.Dirichlet(α * t.ones(K)))
print(f"θ={θ}")
data_d = data[d]
for n in pyro.irange(f"loop_{d}",len(data_d)):
print(f"n={n}")
z = pyro.sample(f"z_{d},{n}", dist.Categorical(θ))
print(f"z={z}")
print(f"θ[z]={θ[z]}")
pyro.sample(f"w_{d},{n}", dist.Categorical(β[z]), obs=data_d[n])
In the output, the Categorical
seems to be sampling well outside the bounds:
α=0.09999999403953552
d=0
θ=tensor([ 0.4130, 0.2750, 0.1093, 0.1633, 0.0394])
n=0
z=4
θ[z]=0.03943357244133949
n=1
z=10
Is this a bug, or am I missing something?
EDIT:
I guess it’s not clear why this is out of bounds. In the sample z ~ Categorical(θ)
, θ has length 5. In the first time through the n
loop, we get z=4
, which is fine. But the next time through we get z=10
despite θ not having changed. The subsequent call to print(f"θ[z]={θ[z]}")
throws the error.