Is there any literature on the best practices for defineing priors for neural network weights? All i’ve seen are examples with normal distribution around 0 and standard deviation of 1. Is there a reasoning for this?

well BNNs dont work very well even for a modest amount of parameters, and this is an area of active research. radford neal’s thesis gives a nice overview of techniques for BNNs.

re gaussians: this is generally because gaussian distributions possess nice properties that make inference more amenable eg analytic solutions, local reparameterization, etc.