Hi @Jingwen, no worries, you’re not disturbing anyone. 
If there is some parameter in your model that you want to learn, you basically have two options: a) you specify it as a param() and learn a point estimate of it, or b) you specify it as a sample(), define its prior distribution, and estimate its full posterior distribution (not just a point estimate).
Say you have an observation x and assume it is Gaussian distributed, and you would like to identify the parameters of that Gaussian. Option a) would be to specify mean and variance as param()s, whereas option b) would be to assume some prior distribution over both and use sample() statements. Option a) corresponds to maximum likelihood estimation and only gives you point estimates, option b) is full posterior inference and gives you uncertainty quantification. (You might want to read up on hierarchical Bayesian modeling in this context.)
Regarding your last point, I just added a simple purely discrete example to my notebook (same link as above). Notice that if everything is discrete, variational inference doesn’t make a lot of sense, as far as I know, since you basically just have to sum over variables in specific ways and everything can be represented in probability tables. TBH I don’t know whether pyro would be the preferred tool for that? I would imagine that there are more specialized and maybe easier to use tools available, but I don’t really know any specific ones. You might want to have a look at, e.g., https://pgmpy.org/?