When multiple observations present, one is dominating another

Hello,

So I have two observations (matrices X and Y) in the model that are all generated from an underlying latent variable let’s call it sigma. So the model looks like something below:

def model(X,Y):
    sigma = pyro.sample('sigma',some_dist(...))
    with pyro.plate('data_X'):
        X = pyro.sample('X',some_dist(somehow related to sigma))
    with pyro.plate('data_Y'):
        Y = pyro.sample('Y',some_dist(somehow related to sigma))

When training the model using only X and Y, the inferred sigma all make sense which I think suggests the way I define the model might be decent. However, when using both X and Y, the inferred sigma is dominated by X, it seems that the observation Y doesn’t affect the inference too much.

By further looking at the scale of the loss function, I found out that the loss function for observation X is like 3 order of magnitude larger than the loss function for observation Y.

So, I rescale my observation Y by multiplying a factor 10,000, and now the effect of Y becomes visible and I am happy with the results. But I think choosing an arbitrary 10,000 is a bit too ad-hoc, so I wonder is there any recommendations for how to better assign weights to different observations, hopefully in a more intelligent (less subjective) way?

I understand there’s a poutine.scale function that it seems to operate on the whole model, would it be possible to operate on certain sample sites instead?

Thanks a lot,
Frank

To answer my own question, the way I found out is to use poutine.scale as a context manager:

def model(X,Y):
    sigma = pyro.sample('sigma',some_dist(...))
    with pyro.plate('data_X'), poutine.scale(scale=10):
        X = pyro.sample('X',some_dist(somehow related to sigma))
    with pyro.plate('data_Y'), poutine.scale(scale=1):
        Y = pyro.sample('Y',some_dist(somehow related to sigma))

More low-level solution is to use poutine.trace to specifically modify the scale attribute for each node, but haven’t tried yet.

Ideally these models would be scale independent, or have a natural scale for each observation. If you provide more model details, perhaps someone could suggest how to make the scales more natural.