I’m seeking advice about how to go about building a mixed effects model that includes multiple sparse, latent, partially observed Gaussian Processes. To fit the model, I’d like to do variational inference with mini-batches. Since I’m relatively new to all of this, I’ll spell out why I think that I need each of the above characteristics in case I’m mistaken about how each term is used.
By “mixed effects” (could be abusing the term here), I just mean that the GPs will be part of the equation that is used to predict the observable, and the sampled values of the GP will be combined with the values of other random variables drawn from other distributions to predict the values of the actual observable.
By “multiple” I simply mean that I need to have more than one GP. These could potentially be combined into a single “multitask” GP, but I think that it is fine to keep them independent from each other for my present purposes.
By “sparse” I mean that I want to save computational cost by sampling the GPs at a pre-defined set of inducing points. In fact, in my application, observations recur at a discrete set of points that saturates as the dataset grows, so there’s really no need to have a new X
for every datapoint.
By “latent” I mean that my observations are not directly on the value of the GP. Instead, I will be using the sampled values in an equation to predict the observable, as described above.
By “partially observed” I mean that, for each GP, I know what the value of the GP should be at a particular X=X_reference
(one of the inducing points). Setting these values (one per GP) is necessary to achieve identifiability of the model.
Since my datasets are large, I want to perform variational inference and, ideally, support mini-batch inferencing.
With all of that being said, I have questions/am seeking input on the following points:
- At a high level, is there reason to believe that the
pyro.contrib.gp
module would be preferable to the Pyro/GPytorch integration for building this model or vice versa? - To perform the partial observation, would it be sufficient to first sample from the GP and then subtract off the difference between the originally sampled value at
X_reference
and the desired refernce value before using it to predict the actual observable? - Is it still true that
pyro.contrib.gp
doesn’t support mini-batch inferencing? Some of the older forum posts indicate that this is the case, but some of the documentation seems to indicate otherwise. - The code example that I think comes closest to doing what I want to do, I think, is this: Latent Function Inference with Pyro + GPyTorch (Low-Level Interface) — GPyTorch 1.9.1 documentation Do you know of any other examples that are even closer to what I’ve described above?
- Based on your experience and my description above, what traps/complications might be waiting for me that I haven’t mentioned yet in this post?
As always, I appreciate any and all advice that you can take the time to share.