# XGen-7B - 1.5T 토큰에 대해 최대 8K 시퀀스 길이로 훈련한 7B LLM

> Clean Markdown view of GeekNews topic #9565. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=9565](https://news.hada.io/topic?id=9565)
- GeekNews Markdown: [https://news.hada.io/topic/9565.md](https://news.hada.io/topic/9565.md)
- Type: news
- Author: [xguru](https://news.hada.io/@xguru)
- Published: 2023-07-01T10:02:01+09:00
- Updated: 2023-07-01T10:02:01+09:00
- Original source: [blog.salesforceairesearch.com](https://blog.salesforceairesearch.com/xgen/)
- Points: 4
- Comments: 0

## Topic Body

- LLM이 많이 사용되면서 긴 시퀀스에 대해서 적용하는 것이 중요해짐: 문서 요약, 코드 작성, 단백질 서열 예측등   
- 하지만 대부분의 오픈소스 LLM(LLaMA, MPT, Falcon) 등은 최대 2K 토큰 시퀀스 길이로 훈련됨   
- XGen-7B 는 최대 8K 시퀀스 길이 까지로 1.5T 코큰에 대해 훈련   
- 표준 NLP 벤치마크에서 같은 모델 사이즈의 MPT, Falcon, LLaMA, Redpajama, OpenLLaMA 와 동등 또는 뛰어난 성능   
- 텍스트(MMLU, QA)와 코드(HumanEval) 작업 모두에서 훌륭한 결과   
- TPU-v4로 1T 토큰에 대해 약 $150K의 훈련 비용 소요

## Comments


_No public comments on this page._