# Microsoft LLMLingua - 추론 가속 및 비용 절감을 위해 프롬프트 압축하기

> Clean Markdown view of GeekNews topic #12470. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=12470](https://news.hada.io/topic?id=12470)
- GeekNews Markdown: [https://news.hada.io/topic/12470.md](https://news.hada.io/topic/12470.md)
- Type: news
- Author: [xguru](https://news.hada.io/@xguru)
- Published: 2023-12-22T10:02:02+09:00
- Updated: 2023-12-22T10:02:02+09:00
- Original source: [github.com/microsoft](https://github.com/microsoft/LLMLingua)
- Points: 10
- Comments: 0

## Topic Body

- GPT2-small 또는 LLaMA-7B와 같이 정렬 및 잘 훈련된 작은 언어 모델을 압축에 사용  
- 프롬프트에서 중요하지 않은 토큰을 감지하고 블랙박스 LLM에서 압축된 프롬프트로 추론을 가능하게 함   
  - LLM의 추론 속도를 높이고 주요 정보에 대한 LLM의 인식을 향상시키기 위해 프롬프트와 KV-Cache를 압축  
  - 성능 손실을 최소화하면서 최대 20배의 압축을 달성  
- 프롬프트 와 생성된 컨텍스트를 줄여서 비용을 절감   
- 프롬프트내에 중요한 정보의 밀집도를 더 높여서 더 긴 컨텍스트를 지원 가능

## Comments


_No public comments on this page._