Nesting plates with 2D data

tjm · May 25, 2021, 4:15pm

Hi,

Is it possible to nest plates while maintaining a 2D dataframe for the data? As I am bringing my training data set to scale, it is too large to bring into memory as a 5 dimensional array (store ID, state, store cluster, date, and data). I am able to build a state x store cluster variable to create a plate for conditional independence, but I would ideally like to have a hierarchy with clusters first and then states nested within that. Do I need to reduce the size of the data to accomplish this, or is it possible to nest plates without increasing the dimensionality of the data?

Thanks in advance!

fehiepsi · May 26, 2021, 4:55am

Hi @tjm, by nesting plates, I guess you meant

with plate("A", size_a, dim=-1):
    with plate("B", size_b, dim=-2):
        x = sample("x", dist.Normal(...))

? If you are using SVI, then for large data, it is better to feed data in batches, like

jit_update = jax.jit(svi.update)
for batch in data_loader:
    state, loss = jit_update(state, batch)

tjm · May 26, 2021, 4:09pm

Yes, that is what I meant by nesting plates. With that code, the data would have shape (obs, size_b, size_a)? Or to apply to my own case, (n_stores, n_days, n_states, n_clusters, n_features), with plates of dim=-2 and dim=-3? Right now I’ve definitely cut a corner although the output indicates to me that this is working correctly:

def model(dat, obs=observed_values):
    with numpyro.plate("plate_state_clusters", n_stateCluster):
        x = numpyro.sample('x', dist.Normal(...))
        y = numpyro.sample('y', dist.Normal(...))
        
    estimate = x[np.array(dat["state_cluster"])] * np.array(dat["x"]) + y[np.array(dat["state_cluster"])] * np.array(dat["y"])
        
    with numpyro.plate("data", len( np.array(dat['x']))):
        numpyro.sample("obs", dist.Normal(estimate, 1.), obs=observed_values)

Batching will help in any case so I will definitely implement that.

Thanks for your help as always.

tjm · May 26, 2021, 6:41pm

Is there a good tutorial for reshaping data to work efficiently with numpyro and nested plates?

tjm · May 26, 2021, 9:14pm

Sorry to spam with questions, but in my case I want to have individual store level coefficients (i.e. use a plate with length n_store), and I also want to have state x cluster level coefficients for each of the features. This issue is that a store is obviously only part of 1 state and 1 cluster, so if I reshape my data to be n_store x n_state x n_cluster, there are tons of null values. Is there a more appropriate way to shape data to accomplish this?

fehiepsi · May 28, 2021, 12:58pm

Hi @tjm, it seems that you’ll need to reshape your data if that’s how the model specified. Previously, when working on the M5 competition, I wrote some functions to do the preprocessing tasks (because there are many products, we have to subsample over it.) I assume that your model is

y ~ c1 @ x + c2 @ x

where c1 are “store” coefs and c2 are “state+cluster” coefs. I think you can preprocess x so that it has two forms x1 and x2 where x1 is compatible to c1 and x2 is compatible to c2. So the model becomes

y ~ c1 @ x1 + c2 @ x2

This way, you won’t need to reshape the data x to be n_store x n_state x n_cluster.

Maybe I misunderstand your question. Could you let me know what are the shapes (in simplified form) of c1, c2, x, y? Do they follow the linear relationship as above?

tjm · May 28, 2021, 5:35pm

X is shaped (70000, 104, 51, 12, 40), with the following dimensions:

n_stores = 70000
n_weeks = 2 years of weekly data = 104
n_states = 51
n_clusters = 12 mutually exclusive store labels
n_features = 40 unique store/week attributes

Each store has 104 weeks of data, but can only be assigned 1 state and 1 cluster. Because there are then 50 states and 11 clusters that do not match the store, right now when I reshape the data, (70000 x 51 x 11) + 50 out of the 70000 x 51 x 12 combinations are null.

Y is just a single value for each store/week.

The linear relationship is correct, so the fully specified model would be:

y ~ c1 @ x + c2 @ x

Where c1 are store level coefs (70000 coefs) and c2 are state+cluster coefs for each feature (51x12x40 = 24480 coefs). I see how x1 can be created as just a list of store IDs the length of the data, but I am not sure how x2 can be reshaped without the stores because the data is at the store/week level. Are you saying x2 should be shaped (51 x 12 x 40 x m) where m is an array of all store weeks in that given state/cluster/feature? That would make the dimensions variable length, since there are not an equal number of stores in each state/cluster combination. I do see how that would eliminate the nulls, but not necessarily how the model would be able to map store IDs to the data getting passed to x2. Or maybe it would be an m-length array where each element of the array has a 51-length dummy for state, a 12-length dummy for cluster, and a 40-length array for the feature values? Apologies if I am not being clear or if I am misunderstanding your recommendation, and thanks again for taking the time to help.

fehiepsi · May 29, 2021, 2:42pm

If I understand correctly, then each store belongs to a state (I skip cluster or other unrelated dimensions because I feel that the argument should be the same - pls correct me if I’m wrong). So I assume you have 1 vector state_indices with length num_stores. Now, your data x has the shape num_stores and some coefficients c with shape num_states. Then the linear relationship of y and x can be encoded as

y ~ x * c[state_indices]

Does that sound reasonable to you?

tjm · June 1, 2021, 9:55pm

That is how I have a successful model specified, with a unique state-cluster index and

y ~ x * c[state_cluster_indices]

In this instance, all state-cluster coefs use their own constants for the parameters of their sample distributions, or they all use a single mu and sigma for a global distribution as the parameters for their sample distributions. I would like to compare to a model with cluster-level distributions that feed the state-cluster distributions, i.e.

with numpyro.plate("clusters", n_clusters):
        cluster_mus = numpyro.sample('cluster_mus ', dist.Normal(loc=np.array([0] * n_clusters),scale=np.ones(n_clusters)))
        cluster_sigmas = numpyro.sample('cluster_sigmas', dist.HalfNormal(np.ones(n_clusters)))
    
        with numpyro.plate("states", n_states):
            state_cluster_coefs = numpyro.sample('state_cluster_coefs', dist.Normal(cluster_mus, cluster_sigmas))

...

y_hat =np.array(X).T * state_cluster_coefs [np.array(stateClusterList)].T

with numpyro.plate("data", len(X)):
        numpyro.sample("estimates", dist.Normal(y_hat , 1.), obs=np.array(y))

Where stateClusterList is an array of tuples the length of X with (state, cluster). If I don’t do the transposes, I am getting an error that the broadcast shapes are incorrect, but that seems odd to me given that

y_hat =state_cluster_coefs [np.array(state_cluster_indices)] * np.array(X)

has worked for me where state_cluster_indicesis an array of single values in this case (custom index of state/cluster combination). The broadcast error says the Incompatible shapes for broadcasting: ((len(X), 2, n_clusters), (1, 1, len(X))). The transposes feel out of place, but can you explain why I would need the transposes? Trying

y_hat =np.array(X) * state_cluster_coefs [np.array(stateClusterList)]

or

y_hat =state_cluster_coefs [np.array(stateClusterList)]*np.array(X)

did not solve the issue, while I would have thought at least one of them would broadcast appropriately.

fehiepsi · June 2, 2021, 5:04am

Hi @tjm, I think with numpy, you can do

import numpy as np

coefs = np.random.randn(4, 3)
state_indices = np.array([0, 1, 2, 3, 1])
cluster_indices = np.array([0, 1, 2, 2, 2])
assert coefs[state_indices, cluster_indices].shape == (5,)

(If numpy indexing does not work in your case, using jax you can take advantage of vmap to not worry about batching dimensions, e.g.

b = vmap(lambda i, j: a[i, j])(first_indices, second_indices)

or you can use tensor indicing - but I don’t think that you need to use those more advanced tools)

marx · July 19, 2021, 10:12pm

Hi @fehiepsi, thanks for the answers you’ve provided already.

I’m new to Numpyro/Pyro eco system and I’m trying to wrap my mind around dimensions in the context of Numpyro models. Since the issue I’m trying to solve is rather similar with the one asked by @tjm, I thought I could post this on this thread.

For example, say I have a dataset with 1000 unique stores. These stores are further categorise into 10 groups and in 30 cities. Also, let’s assume we have some additional features (10 of them) based on the store and category of the store. Finally, assume we have observations for 30 consecutive days.

I want to model say daily demand for a given store, of a certain category within a city. I figure out that I can create an array with shape (1000, 10, 30, 10). I want to build a hierarchical model with nested levels/hierarchies: Stores|Categories|Cities. In addition, I want to estimate the coefficients for each of the levels drawn from the features.

Now, my confusion is with the dimensions.

def model(X):
    n_stores, n_categories, n_cities, n_days, n_features = X.shape
    n_features -= 1
    
    plate_features = numpyro.plate('plate_features', n_features, dim=-1)
    plate_stores = numpyro.plate('plate_stores', n_stores, dim=-4)
    plate_categories = numpyro.plate('plate_categories', n_categories, dim=-3)
    plate_cities = numpyro.plate('plate_cities', n_cities, dim=-2)
    plate_days = numpyro.plate('plate_days', n_days, dim=-1)
    
     ........
     with plate_features: 
             with plate_stores:
                     with plate_categories:
                             with plate_cities:
                                  thetas = .....

Firs, is the specification of this dimensions reasonable? If yes, is the hierarchical setup using plates as shown above the correct way of doing this? I already encountered some shape mismatch when I tried the first time but I’m wondering what my root cause is…perhaps it’s because of the model specification or the dimensions.

Sorry if I am asking something that"s already answered in the documentations or posts here. I will appreciate if someone could point me to it. Thank you.

fehiepsi · July 20, 2021, 12:45am

Hi @marx, I’m not good at modeling so would need additional information to understand the context. With given information, I can’t say if it is reasonable. Could you draw a graphical diagram for your model?

marx · July 21, 2021, 7:12pm

Hi @fehiepsi Thanks for the response! Sure, I will draw the model (or at least how I think the model should be). I haven’t drawn one before…but I came across this nice python package called Daft. Will explore it a bit, but if there are other better tools, I’m interested in exploring such a tool/package!