Non-standard mixture models with shared mixtures

Federico_V · August 8, 2024, 5:18pm

Hi everyone,

I’m trying to build a non standard mixture model.

Simplifying slightly, my data looks like this:

obs, sample_id, replicate_id
0.1, A, 1
0.05, A, 1
0.15, A, 1
1.3, A, 2
1.2, A, 2
1.1, A, 2
15, B, 1
15.1, B, 1
15.1, B, 2
15.1, B, 2,
14.9, B, 3
14.9, B, 3

And I want my model to look like this:

assigned_cluster # depends on sample_id AND replicate_ID
locs # depends on sample_id alone
y_i ~ locs[sample_id_i, assigned_cluster_{ij}] # this isn't a proper notation

The challenge I’m dealing with is that different sample_id have a different number of maximum clusters, so fitting this in plate notation is complicated.

in Python pseudo code:

n_max_replicates = data.groupby("sample_id")["replicate_id"].max().to_dict()

for row in data:
  sample_id = row["sample_id"]
  replicate_id = row["replicate_id"]

  cluster_probas = assignments[sample_id][replicate_id] # this is n_max_replicates dimensional
  cluster_means = locs[sample_id]  this is n_max_replicates dimensional

I had a few ideas on how to proceed:

The maximum number of clusters is not that high (it’s 4). Maybe I can break my likelihood in 4 different blocks, then group observations by numbers of clusters?
Generate loc as an array of dimension (n_sample_id, max_cluster_size) - and then ignore all the unused elements of loc?

Thanks for all the work on this great library!

Federico