# Llama.cpp 의 모델 가중치 로딩 속도를 10~100배 개선

> Clean Markdown view of GeekNews topic #8859. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=8859](https://news.hada.io/topic?id=8859)
- GeekNews Markdown: [https://news.hada.io/topic/8859.md](https://news.hada.io/topic/8859.md)
- Type: news
- Author: [xguru](https://news.hada.io/@xguru)
- Published: 2023-04-03T10:03:01+09:00
- Updated: 2023-04-03T10:03:01+09:00
- Original source: [github.com/ggerganov](https://github.com/ggerganov/llama.cpp/pull/613)
- Points: 13
- Comments: 1

## Topic Body

- 파일 포맷 변경으로 read() 없이 mmap() 가능해져서 가중치 로딩 속도가 10~100배 빨라짐   
- 싱글 파일인 7B 및 멀티 파일인 13B 등도 지원해졌고, 로딩 코드가 훨씬 심플해짐   
- 또한, 이 변경으로 인해 텐서들이 32바이트 경계에서 정렬이 되어, 특정 프로세서에서 추가적인 성능향상을 기대해 볼수 있게 됨

## Comments


### Comment 15499

- Author: xguru
- Created: 2023-04-03T10:04:02+09:00
- Points: 1

[LLaMA - Meta가 공개한 65b 파라미터 LLM](https://news.hada.io/topic?id=8578)  
[llama.cpp - 페이스북의 LLaMA 모델을 순수 C/C++로 추론하기](https://news.hada.io/topic?id=8682)