Conventional loss with elbo loss

udion · October 1, 2018, 6:32pm

Hi,

I wanted to put priors on some of the layers in the network and learn the parameters, pyro allows to do this. However I want to train the network in the following manner, I was wondering if this is possible and if the training strategy has some flaw that I am overlooking (according to me, this might help me train faster)

say for 5 iterations I sample a set of parameters and optimize for Elbo using guide and model functions

then for the next 2 iterations, I sample a set of parameters and optimize MSE/ Cross-entropy/ etc

edit:

I realize this might be more helpful than the above, can I optimize Elbo + K*MSE (etc) where K is some hyper -param ?

Any hints will be appreciated.

Thanks!

lmao · October 2, 2018, 11:49am

It seems what you want to optimize is the elbo of a Bayessian network with loss function loss + K*MSE, where loss is the loss function of your original network.

Thus it should be ok if loss + K*MSE makes sense for an ordinary network.

martinjankowiak · October 5, 2018, 2:52am

the easiest way to do this would be to add an additional observe statement using a normal probability distribution in your model (note that the log prob of a normal distribution gives you a quadratic term; the constant K would then effectively be determined by the variance of the normal distribution)

udion · October 6, 2018, 5:56am

I didn’t get you.

ELBO itself is a loss term right? I want to add some other terms in that loss function.

udion · October 6, 2018, 5:57am

Thank you for the suggestion. I was able to make it work using this,

github.com/pyro-ppl/pyro

Add custom loss

opened 11:09PM - 22 Apr 18 UTC

closed 03:19AM - 23 Apr 18 UTC

jthsieh

question

I've been playing around with Pyro and it has worked pretty well so far. Thanks …for providing the tutorials! My question is that right now, the loss is just ELBO. How do I add additional loss terms besides ELBO? For example, I might want to add an L2 loss term like `loss += lambda * F.mse_loss(x, y)`. I guess I can probably do the normal procedure: `optimizer.zero_grad()` `loss.backward()` `optimizer.step()` in addition to `svi.step()`, but I'm not sure whether this is appropriate and what `svi.step()` does exactly.