# SantaCoder - 11억개(1.1B) 파라미터로 학습한 코드 생성 모델

> Clean Markdown view of GeekNews topic #8130. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=8130](https://news.hada.io/topic?id=8130)
- GeekNews Markdown: [https://news.hada.io/topic/8130.md](https://news.hada.io/topic/8130.md)
- Type: news
- Author: [xguru](https://news.hada.io/@xguru)
- Published: 2022-12-29T09:56:15+09:00
- Updated: 2022-12-29T09:56:15+09:00
- Original source: [huggingface.co](https://huggingface.co/bigcode/santacoder)
- Points: 6
- Comments: 2

## Topic Body

- Python, Java, Javascript 코드로 학습한 멀티언어 랭귀지 모델   
- LTR 생성 및 Infilling에서 페이스북의 InCoder(6.7B) / 세일즈포스의 CodeGen-Multi (2.7B) 보다는 뛰어나다고  
- BigCode가 공개했던 The-Stack v1.1(6TB) 데이터셋의 일부를 사용

## Comments


### Comment 13917

- Author: siabard
- Created: 2022-12-29T14:07:58+09:00
- Points: 1

지금은 AI 프로젝트들이 파라미터 성능빨로 엄청 밀어붙이고는 있는데, 가격이 어느 정도 책정되어야 수익성을 담보할 지 궁금하네요. CoPilot처럼 한 달에 $10 으로 과연 비용을 충당할 수 있을지... (대기업 걱정하는 것이 쓸데없는 일이라는 것은 알지만요...)

### Comment 13907

- Author: xguru
- Created: 2022-12-29T09:56:30+09:00
- Points: 1

GitHub의 CoPilot은 12B 인데, 그것과는 생성 코드 품질면에서 좀 차이가 난다고는 하네요