Hello, I have a pretty high-level question related to my research using Astronomical Stellar Spectra.
Background
I am involved with an open-source project to do inference of spectroscopic data from stars. Spectra have a lot of information baked-in but statistical fitting is tough because the only source of comparison that we have are from spectral models. These models require intensive simulation and therefore are only created on a certain grid of intrinsic stellar parameters. In our research, these parameters are (T, logg, Z).
Because we want to fit our data using points that exist in-between the model grid points, we need a way of interpolating, but in our specific example, we also want to forward-propagate our errors for use in the inference. We also want to dimensionally reduce our model spectra for more efficient fitting.
what we’ve currently been doing is taking our model spectra and performing PCA decomposition on them to create “eigenspectra” and corresponding “weights” (eigenvalues) to recreate the original model grid. Currently, this looks a bit like this-
ncomps = 7
X = fluxes # Shape (Ngrid, Npix), usually like (100, 30000)
Xcentered = X - X.mean(0)
Xwhitened = Xcentered / Xcentered.std(0)
U, S, V = torch.svd(Xwhitened)
components = V.mm(torch.diag(S))[:, :ncomps] # Shape (Npix, ncomps)
weights = U[:, :ncomps] # Shape (Ngrid, ncomps)
Using this decomposition, we can recreate any of our models on the grid with
reconstructed = weights[0].matmul(components.t())
The problem
So, given this dimensionality reduction, we need to find a function to map our stellar parameters (T, logg, Z) to the weights used in reconstruction. We’ve been previously using a Gaussian process to sample functions for this mapping. More importantly, we want to fit this function using a loss metric that accurately portrays how well the function maps the stellar parameters to the original grid fluxes (rather than Gaussian likelihood that measure how well the parameters recreate our PCA weights). I’ve done some reading around and it seems like GPLVM is supposed to be the solution to this problem.
So, basically, I’m looking for a way to use GPLVM to map the stellar parameters to reconstructed model spectra using dimensionally reduced latent “eigenspectra” where I can get both the reconstructed spectra as well as the covariance from this reconstruction. The covariance matrix we’ve previously used looks something like
c = Kernel(X, X) # shape (M, M)
# create block diagonal of a batch of the above kernels
covs = batch_kernel(X, X) # shape(ncomps, M, M)
C = block_diag(covs) # shape (ncomps*M, ncomps*M)
# we typically don't evaluate this covariance directly because of its large size
cov = torch.chain_matmul(components.t(), C, components) # shape (Npix, Npix)
What I want
My ideal workflow would look something like this
X = grid_points # shape (M, 3)
Y = fluxes # shape(M, Npix) (Npix is ~ 30000)
pca = PCA(ncomps=7)
comps, weights = pca(Y)
comps.shape # (Npix, ncomps)
weights.shape # (M, ncomps)
kernel = gp.kernels.RBF(input_dim=3, lengthscale=torch.tensor([300, 30, 30]))
gpr = gp.model.GPRegression(X, weights, kernel)
gplvm = gp.model.GPLVM(gpr, comps, Y)
gp.util.train(gplvm)
# sample a model
fluxes, cov = gplvm(torch.tensor([[7230, 4.32, 0.0]]), diag=False)
fluxes.shape # (1, Npix)
cov.shape # (1, Npix, Npix)
where cov
is the forward-propagated covariance from the interpolation and reconstruction using the reduced dimensions (not just the self-covariance of the fluxes)
Does this exist in the pyro framework? Can we make it exist in the pyro framework?