# LlamaGym - 온라인 강화 학습을 통한 LLM 에이전트의 파인 튜닝

> Clean Markdown view of GeekNews topic #13938. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=13938](https://news.hada.io/topic?id=13938)
- GeekNews Markdown: [https://news.hada.io/topic/13938.md](https://news.hada.io/topic/13938.md)
- Type: news
- Author: [xguru](https://news.hada.io/@xguru)
- Published: 2024-03-22T10:16:01+09:00
- Updated: 2024-03-22T10:16:01+09:00
- Original source: [github.com/KhoomeiK](https://github.com/KhoomeiK/LlamaGym)
- Points: 9
- Comments: 0

## Topic Body

- LLM기반 에이전트를 강화학습(RL)을 통해 미세조정하는 것을 단순화함  
- 현재 LlamaGym은 Gym 환경에서 에이전트 프롬프팅 및 하이퍼파라미터를 빠르게 반복하고 실험할 수 있게 해주는 단일 `Agent` 추상 클래스를 제공  
- 사용자는 `Agent` 클래스에 3개의 추상 메소드를 구현하여 자신만의 LLM 기반 에이전트를 정의할 수 있음  
  
### 사용법  
  
- LlamaGym 설치 후, `Agent` 클래스에 3개의 추상 메소드를 구현하여 블랙잭 플레이어 에이전트를 만듦.  
- 기본 LLM을 정의하고 에이전트를 인스턴스화한 다음, RL 루프를 작성하여 에이전트가 행동하고, 보상을 받으며, 에피소드를 종료하게 함.  
- 강화학습을 통한 온라인 학습은 어려운 부분이 있으므로 하이퍼파라미터 조정이 필요하며, 감독된 미세조정 단계가 도움이 될 수 있음.

## Comments


_No public comments on this page._