Training Networks parallely on multiple GPU

It sounds like your task requires no communication or synchronization between tasks. In that case I would simply parallel map over your training tasks where each task trains one network on one GPU. One simple way to do this is to split your models into groups and run

$ CUDA_VISIBLE_DEVICES=0 python train_some_models.py &
$ CUDA_VISIBLE_DEVICES=1 python train_other_models.py &
$ CUDA_VISIBLE_DEVICES=2 python train_more_models.py &
...