Training Networks parallely on multiple GPU

adnan1306 · July 23, 2020, 4:03pm

I’m trying to train multiple Pyro networks with same architecture. How can I train them parallelly with multiple GPUs?

fritzo · July 25, 2020, 4:36am

It sounds like your task requires no communication or synchronization between tasks. In that case I would simply parallel map over your training tasks where each task trains one network on one GPU. One simple way to do this is to split your models into groups and run

$ CUDA_VISIBLE_DEVICES=0 python train_some_models.py &
$ CUDA_VISIBLE_DEVICES=1 python train_other_models.py &
$ CUDA_VISIBLE_DEVICES=2 python train_more_models.py &
...

adnan1306 · July 25, 2020, 12:26pm

I need to average network parameters after each epoch ( Federated Learning). I tried using parallel for loops using python library but the param store doesn’t gets updated globally it seems. Any help would be highly appreciated. Thanks!

fritzo · July 25, 2020, 10:53pm

Hi @adnan1306, It’s hard for me to suggest a change without more details of your problem, but you might try using Horovod as in a recent Pyro example. That example uses Pyro’s HorovodOptimizer (avialable in the dev branch, not yet released) which trains a single model replicated on all workers by averaging gradients. If you instead want to perform gradient updates locally and occasionally average model parameters across workers, you might be able to modify that example to avoid the HorovodOptimizer and instead occasionally synchronize parameters using horovod.torch.allreduce_(), for example:

import horovod.torch as hvd
...
for step, ... in my_training_loop:
    # Train independently.
    svi.step(...)

    # Synchronize every 10 steps.
    if step % 10 == 0:
        for m in (model, guide):
            for name, param in sorted(m.named_parameters()):
                hvd.allreduce_(param, name=name)

That last line averages model parameters across workers.

adnan1306 · July 28, 2020, 11:23am

Thanks for the help. I will explore this!