Hi, pyro.param is for things you want to optimize, typically with PyTorch’s gradient-based optimizers. You can’t directly perform gradient-based optimization on discrete parameters, and there’s no smooth transformation from continuous to discrete values, hence the error. Do you really need to optimize over discrete parameters? If so, see the Bayesian optimization tutorial. Otherwise, there’s no need to use pyro.param at all.
I’d suggest starting with a MaskedMixture distribution rather than hand-coding the masking logic. I think your two component distributions would be a MultivariateNormal and a Delta(...).to_event(1). See these tests for example usage.
I’m not sure about the model you shared, but it usually doesn’t work to observe a Delta distribution (i.e. usually results in NAN loss).