We now have native support for all ZeRO stages 1/2/3 for Muon Optimizers, providing superior performance on LLM pre-training and post-training. Feel free to try it out, kudos to @PKUWZP Guokai Ma, Peng Du and Chi for the contribution!
DeepSpeed now supports the Muon Optimizer.
Optimized specifically for internal 2D weights within neural networks, Muon is gaining traction for its significant memory savings and strong convergence metrics during LLM training.
In our latest blog post, the DeepSpeed team shares a
Introducing DeepSpeed-FastGen 🚀
Serve LLMs and generative AI models with
- 2.3x higher throughput
- 2x lower average latency
- 4x lower tail latency
w. Dynamic SplitFuse batching
Auto TP, load balancing w. perfect linear scaling, plus easy-to-use API
github.com/microsoft/Deep…
Want to train 10B+ ChatGPT-style models on a single GPU and 100B+ on multi-GPUs systems? Introducing DeepSpeed-Chat, an easy (single script), fast, and low-cost solution for training high-quality ChatGPT-style models with RLHF, 15x faster than SoTA.
Blog: github.com/microsoft/Deep…
Introducing Mixtral, Phi2, Falcon, and Qwen support in #DeepSpeed-FastGen!
- Up to 2.5x faster LLM inference
- Optimized SplitFuse and token sampling
- Exciting new features like RESTful API and more!
For more details: github.com/microsoft/Deep…#DeepSpeeed#AI
DeepSpeed-Chat aims to provide a highly efficient pipeline to help you explore RLHF training. Towards this aim we are releasing training logs and our experiences in a new tutorial:
github.com/microsoft/Deep…
(🧵 thread 1/3)
🚀 Announcing DeepSpeed ZeRO-Offload++
-6x Higher Training Throughput via Collaborative CPU/GPU Twin-Flow 🔥
-Systematic optimizations at no data precision loss
-Performance gain maintains for both single and multi-node cases
github.com/microsoft/Deep…
We recently finished a long-awaited sync between microsoft/Megatron-DeepSpeed and NVIDIA/Megatron-LM 🚀🚀🚀
This resulted in a ~10% throughput gain, together with support for FlashAttention (both 1 and 2) and Rotary Positional Embedding (RoPE)!
Details:
🚀Exciting new updates on #DeepSpeed ZeRO-Inference with 20X faster generation!
- 4x lesser memory usage through 4-bit weight quantization with no code change needed.
- 4x larger batch sizes through KV cache offloading.
Available in DeepSpeed v0.10.3: aka.ms/z3-inference
Want to train 1 million token context lengths (all 7 of the Harry Potter books!📚) on a GPT-like model w. 64 GPUs?
Announcing DeepSpeed-Ulysses🚀
This release enables highly efficient and scalable LLM training with extremely long sequence lengths🤯
github.com/microsoft/Deep…