# vLLM: PagedAttention을 이용한 쉽고, 빠르고 저렴한 LLM 서빙

> Clean Markdown view of GeekNews topic #9464. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=9464](https://news.hada.io/topic?id=9464)
- GeekNews Markdown: [https://news.hada.io/topic/9464.md](https://news.hada.io/topic/9464.md)
- Type: news
- Author: [xguru](https://news.hada.io/@xguru)
- Published: 2023-06-23T10:32:02+09:00
- Updated: 2023-06-23T10:32:02+09:00
- Original source: [vllm.ai](https://vllm.ai/)
- Points: 8
- Comments: 0

## Topic Body

- 빠른 LLM 추론 및 서빙을 위한 오픈소스 라이브러리   
- PagedAttention 알고리듬으로 어텐션 키/값을 효율적으로 관리  
  - 모델 아키텍처 변경없이 HuggingFace Transformers 대비 24배 높은 처리량   
  - 비연속 메모리 공간에 연속된 키/값을 저장 가능   
- LMSYS Vicuna 와 Chatbot Arena 에서 성공적으로 이용중

## Comments


_No public comments on this page._