Advertisement · 728 × 90
#
Hashtag
#horovod
Advertisement · 728 × 90

For those doing #HPC #DL model trainings, I need some suggestions. I want to use #horovod in a multi-gpu, multimode setting, but using #Apptainer (or #Docker) containers, due to cluster policy issues. Any reference to share?

Issue here: github.com/horovod/horo...

1 0 1 0
Preview
mpi-operator/ADOPTERS.md at master · kubeflow/mpi-operator Kubernetes Operator for MPI-based applications (distributed training, HPC, etc.) - kubeflow/mpi-operator

Check out this growing list of adopters for @kubeflow MPI Operator - allreduce-style distributed training on @kubernetesio! If your company would like to be included here, please send us a pull request! http://bit.ly/2u7z7TZ #horovod @ApacheMXNet @TensorFlow @PyTorch

1 0 0 0