In case anybody else has this problem, @fehiepsi was correct in that some chains were slower than others. One chain had a larger treedepth than the others, so it was ‘bottlenecking’ the computation.
I resolved this issue by increasing the number of warmup samples, so all of the chains adapted similarly.