What you are doing is very similar to what I’m trying to do, except that I have a discrete mixture model and a logistic regression response rather than your GMM and linear regression. Coming from a frequentist statistics background, the way I’m thinking about these models is that they are essentially multilevel models where you don’t know the levels in advance.
Looking at your implementation, it looks like you want you have separate intercepts and gradients for each cluster found by the mixture model. Do I have that correct? My model has a response sector with global gradients but an intercept per cluster but should be easily generalisable to a gradient per cluster as well.
I posted my model code in this thread, which you may find helpful: Model with a joint posterior distribution