Running MCMC in parallel

Hi, I am trying to implement hierarchical Dirichlet process hidden Markov model (HDP-HMM). In theory, the number of hidden states is infinite. For realisation on computer, let’s say setting the number of hidden states to a big value.
Under each hidden state, a multivariate gaussian distribution is used to model the observation vector. I use MCMC to draw samples from the posterior distribution (observations are from a subset of the entire dataset) to estimate parameters (mean, covariance matrix). The parameters are conditionally independent given the hidden state. Therefore, in this case, drawing parameters’ samples for each hidden state is a embarrassingly parallel problem. I have read through MCMC with multiple chains, however, I don’t think this can be applied.
My question is if it is possible to run MCMC under each hidden state in parallel on gpu?
I really appreciate it if anyone could share some ideas!!