# DeepSpeed Ulysses: 긴 시퀀스 트랜스포머 모델 훈련을 위한 시스템 최적화

> Clean Markdown view of GeekNews topic #10642. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=10642](https://news.hada.io/topic?id=10642)
- GeekNews Markdown: [https://news.hada.io/topic/10642.md](https://news.hada.io/topic/10642.md)
- Type: news
- Author: [xguru](https://news.hada.io/@xguru)
- Published: 2023-08-31T11:03:01+09:00
- Updated: 2023-08-31T11:03:01+09:00
- Original source: [github.com/microsoft](https://github.com/microsoft/DeepSpeed/tree/master/blogs/deepspeed-ulysses)
- Points: 5
- Comments: 0

## Topic Body

- 기존 시스템보다 4배 더 긴 시퀀스 길이를 제공, 백만개 이상의 토큰이 포함된 시퀀스로 훈련 가능   
- 통신이 10배 이상 감소하여 처리량이 최대 2.5배 향상. 처리량이 175 TFlops/GPU 이상으로 유지   
- 완전히 general 하고 구현에 agnostic한 Attention (FlashAttention 2 같은 구현과도 동작)  
- 대규모 모델 훈련 지원: ZeRO-3 과 함께 작동하여 대규모 시퀀스/모델 크기를 지원   
- 사용하기 쉽고 이식성이 뛰어나 기존 프레임워크 변경 최소화

## Comments


_No public comments on this page._