Nesting plates with 2D data

Hi,

Is it possible to nest plates while maintaining a 2D dataframe for the data? As I am bringing my training data set to scale, it is too large to bring into memory as a 5 dimensional array (store ID, state, store cluster, date, and data). I am able to build a state x store cluster variable to create a plate for conditional independence, but I would ideally like to have a hierarchy with clusters first and then states nested within that. Do I need to reduce the size of the data to accomplish this, or is it possible to nest plates without increasing the dimensionality of the data?

Thanks in advance!

Hi @tjm, by nesting plates, I guess you meant

with plate("A", size_a, dim=-1):
    with plate("B", size_b, dim=-2):
        x = sample("x", dist.Normal(...))

? If you are using SVI, then for large data, it is better to feed data in batches, like

jit_update = jax.jit(svi.update)
for batch in data_loader:
    state, loss = jit_update(state, batch)

Yes, that is what I meant by nesting plates. With that code, the data would have shape (obs, size_b, size_a)? Or to apply to my own case, (n_stores, n_days, n_states, n_clusters, n_features), with plates of dim=-2 and dim=-3? Right now I’ve definitely cut a corner although the output indicates to me that this is working correctly:

def model(dat, obs=observed_values):
    with numpyro.plate("plate_state_clusters", n_stateCluster):
        x = numpyro.sample('x', dist.Normal(...))
        y = numpyro.sample('y', dist.Normal(...))
        
    estimate = x[np.array(dat["state_cluster"])] * np.array(dat["x"]) + y[np.array(dat["state_cluster"])] * np.array(dat["y"])
        
    with numpyro.plate("data", len( np.array(dat['x']))):
        numpyro.sample("obs", dist.Normal(estimate, 1.), obs=observed_values)

Batching will help in any case so I will definitely implement that.

Thanks for your help as always.

Is there a good tutorial for reshaping data to work efficiently with numpyro and nested plates?

Sorry to spam with questions, but in my case I want to have individual store level coefficients (i.e. use a plate with length n_store), and I also want to have state x cluster level coefficients for each of the features. This issue is that a store is obviously only part of 1 state and 1 cluster, so if I reshape my data to be n_store x n_state x n_cluster, there are tons of null values. Is there a more appropriate way to shape data to accomplish this?

Hi @tjm, it seems that you’ll need to reshape your data if that’s how the model specified. Previously, when working on the M5 competition, I wrote some functions to do the preprocessing tasks (because there are many products, we have to subsample over it.) I assume that your model is

y ~ c1 @ x + c2 @ x

where c1 are “store” coefs and c2 are “state+cluster” coefs. I think you can preprocess x so that it has two forms x1 and x2 where x1 is compatible to c1 and x2 is compatible to c2. So the model becomes

y ~ c1 @ x1 + c2 @ x2

This way, you won’t need to reshape the data x to be n_store x n_state x n_cluster.

Maybe I misunderstand your question. Could you let me know what are the shapes (in simplified form) of c1, c2, x, y? Do they follow the linear relationship as above?

X is shaped (70000, 104, 51, 12, 40), with the following dimensions:

n_stores = 70000
n_weeks = 2 years of weekly data = 104
n_states = 51
n_clusters = 12 mutually exclusive store labels
n_features = 40 unique store/week attributes

Each store has 104 weeks of data, but can only be assigned 1 state and 1 cluster. Because there are then 50 states and 11 clusters that do not match the store, right now when I reshape the data, (70000 x 51 x 11) + 50 out of the 70000 x 51 x 12 combinations are null.

Y is just a single value for each store/week.

The linear relationship is correct, so the fully specified model would be:

y ~ c1 @ x + c2 @ x

Where c1 are store level coefs (70000 coefs) and c2 are state+cluster coefs for each feature (51x12x40 = 24480 coefs). I see how x1 can be created as just a list of store IDs the length of the data, but I am not sure how x2 can be reshaped without the stores because the data is at the store/week level. Are you saying x2 should be shaped (51 x 12 x 40 x m) where m is an array of all store weeks in that given state/cluster/feature? That would make the dimensions variable length, since there are not an equal number of stores in each state/cluster combination. I do see how that would eliminate the nulls, but not necessarily how the model would be able to map store IDs to the data getting passed to x2. Or maybe it would be an m-length array where each element of the array has a 51-length dummy for state, a 12-length dummy for cluster, and a 40-length array for the feature values? Apologies if I am not being clear or if I am misunderstanding your recommendation, and thanks again for taking the time to help.

If I understand correctly, then each store belongs to a state (I skip cluster or other unrelated dimensions because I feel that the argument should be the same - pls correct me if I’m wrong). So I assume you have 1 vector state_indices with length num_stores. Now, your data x has the shape num_stores and some coefficients c with shape num_states. Then the linear relationship of y and x can be encoded as

y ~ x * c[state_indices]

Does that sound reasonable to you?

That is how I have a successful model specified, with a unique state-cluster index and

y ~ x * c[state_cluster_indices]

In this instance, all state-cluster coefs use their own constants for the parameters of their sample distributions, or they all use a single mu and sigma for a global distribution as the parameters for their sample distributions. I would like to compare to a model with cluster-level distributions that feed the state-cluster distributions, i.e.

with numpyro.plate("clusters", n_clusters):
        cluster_mus = numpyro.sample('cluster_mus ', dist.Normal(loc=np.array([0] * n_clusters),scale=np.ones(n_clusters)))
        cluster_sigmas = numpyro.sample('cluster_sigmas', dist.HalfNormal(np.ones(n_clusters)))
    
        with numpyro.plate("states", n_states):
            state_cluster_coefs = numpyro.sample('state_cluster_coefs', dist.Normal(cluster_mus, cluster_sigmas))

...

y_hat =np.array(X).T * state_cluster_coefs [np.array(stateClusterList)].T

with numpyro.plate("data", len(X)):
        numpyro.sample("estimates", dist.Normal(y_hat , 1.), obs=np.array(y))

Where stateClusterList is an array of tuples the length of X with (state, cluster). If I don’t do the transposes, I am getting an error that the broadcast shapes are incorrect, but that seems odd to me given that

y_hat =state_cluster_coefs [np.array(state_cluster_indices)] * np.array(X)

has worked for me where state_cluster_indicesis an array of single values in this case (custom index of state/cluster combination). The broadcast error says the Incompatible shapes for broadcasting: ((len(X), 2, n_clusters), (1, 1, len(X))). The transposes feel out of place, but can you explain why I would need the transposes? Trying

y_hat =np.array(X) * state_cluster_coefs [np.array(stateClusterList)]

or

y_hat =state_cluster_coefs [np.array(stateClusterList)]*np.array(X)

did not solve the issue, while I would have thought at least one of them would broadcast appropriately.

Hi @tjm, I think with numpy, you can do

import numpy as np

coefs = np.random.randn(4, 3)
state_indices = np.array([0, 1, 2, 3, 1])
cluster_indices = np.array([0, 1, 2, 2, 2])
assert coefs[state_indices, cluster_indices].shape == (5,)

(If numpy indexing does not work in your case, using jax you can take advantage of vmap to not worry about batching dimensions, e.g.

b = vmap(lambda i, j: a[i, j])(first_indices, second_indices)

or you can use tensor indicing - but I don’t think that you need to use those more advanced tools)