Intercom Interphone - Search News

MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism

Abstract: It remains challenging to train billion-scale DNN models on a single modern multi-GPU server due to the GPU memory wall. Unfortunately, existing memory-saving techniques such as GPU-CPU swap ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

Feedback

MPress: Democratizing Billion-Scale Model Training on Multi-GPU Servers via Memory-Saving Inter-Operator Parallelism

Trending now