28] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)

(discuss.pytorch.kr)

DAIR.AI에서 매주 공개하는 ML 논문들에 대한 글을 자동 번역해보았습니다.
이번 주에 제출된 논문들을 살펴보면 대부분이 차세대 언어 모델(Large Language Models, LLMs)과 그 응용 분야에 초점을 맞춘 것으로 보입니다. 구체적으로 "Make Your LLM Fully Utilize the Context", "Graph Machine Learning in the Era of LLMs", "Self-Evolution of LLMs" 그리고 "Naturalized Execution Tuning (NExT)" 등은 LLMs의 발전 및 최적화 방법, 그리고 새로운 응용 분야에 대한 연구 내용을 다루고 있습니다. 이는 최근 인공지능 분야에서 LLMs의 중요성과 응용 가능성이 커지고 있음을 반영합니다.
LLMs의 발전이 중요한 이유는 이 모델들이 자연어 처리(Natural Language Processing, NLP)뿐만 아니라 여러 멀티모달(Task) 작업을 수행할 때도 뛰어난 성능을 보이기 때문입니다. 예를 들어, "Make Your LLM Fully Utilize the Context"라는 논문은 LLMs가 제공하는 컨텍스트를 최대한 활용하여 보다 정확한 정보를 추출하고 해석하는 방법에 대해 탐구합니다. 또한, "Graph Machine Learning in the Era of LLMs"는 그래프 기반 데이터 학습이 어떻게 LLMs를 통해 향상될 수 있는지에 대한 연구로, 이는 복잡한 관계와 패턴을 이해하는 데 큰 도움이 됩니다.
이러한 경향은 인공지능 분야에서 LLMs의 역할이 단순히 언어 이해와 생성에 그치지 않고, 더 광범위한 문제 해결과 응용 분야로 확장되고 있음을 시사합니다. 이는 연구자들이 인공지능의 다양한 측면을 탐색하고, 특히 인간의 언어를 더 잘 이해하고 사용할 수 있는 모델을 개발하기 위한 노력의 일환으로 볼 수 있습니다. 또한 이러한 연구 경향은 앞으로도 다양한 분야에서의 LLMs 활용이 증가할 것이라는 전망을 뒷받침합니다.

[IMG] [2024/04/22 ~ 04/28] 이번 주의 주요 ML 논문 (Top ML Papers of the Week)|1028x618

Phi-3 기술 보고서: 휴대전화의 로컬에서 뛰어난 성능을 발휘하는 언어 모델 / Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

논문 소개

3조 3천억 개의 토큰으로 훈련된 새로운 3.8B 매개변수 언어 모델인 phi-3-mini는 Mixtral 8x7B 및 GPT-3.5와 경쟁하는 것으로 보고되었으며, 기본 컨텍스트 길이가 4K이지만 128K로 확장된 버전(phi-mini-128K)도 포함하고, 3.8B 모델을 훈련하기 위해 고도로 필터링된 웹 데이터와 합성 데이터를 결합하고, 4.8T 토큰으로 훈련된 7B 및 14B 모델(phi-3-small 및 phi-3-medium) 결과도 보고하고 있습니다.

A new 3.8B parameter language model called phi-3-mini trained on 3.3 trillion tokens and is reported to rival Mixtral 8x7B and GPT-3.5; has a default context length of 4K but also includes a version that is extended to 128K (phi-mini-128K); combines heavily filtered web data and synthetic data to train the 3.8B models; it also reports results on 7B and 14B models trained on 4.8T tokens (phi-3-small and phi-3-medium).

논문 초록(Abstract)

3조 3천억 개의 토큰으로 훈련된 38억 개의 파라미터 언어 모델인 phi-3-mini를 소개합니다. 학술 벤치마크와 내부 테스트에서 측정한 전체 성능은 휴대폰에 배포할 수 있을 만큼 작은 크기임에도 불구하고 Mixtral 8x7B 및 GPT-3.5와 같은 모델에 필적하는 수준(예: phi-3-mini는 MMLU에서 69%, MT-bench에서 8.38 달성)에 이르렀습니다. 이러한 혁신은 전적으로 훈련용 데이터 세트에 있으며, 이는 고도로 필터링된 웹 데이터와 합성 데이터로 구성된 phi-2에 사용된 데이터 세트의 확장 버전입니다. 이 모델은 또한 견고성, 안전성 및 채팅 형식에 맞게 더욱 조정되었습니다. 또한 4.8T 토큰에 대해 훈련된 7B 및 14B 모델, 즉 phi-3-small 및 phi-3-medium으로 불리는 초기 파라미터 확장 결과를 제공하며, 두 모델 모두 phi-3-mini보다 훨씬 더 뛰어난 성능(예: MMLU에서 각각 75% 및 78%, MT-bench에서 8.7 및 8.9)을 보입니다.

We introduce phi-3-mini, a 3.8 billion parameter language model trained on 3.3 trillion tokens, whose overall performance, as measured by both academic benchmarks and internal testing, rivals that of models such as Mixtral 8x7B and GPT-3.5 (e.g., phi-3-mini achieves 69% on MMLU and 8.38 on MT-bench), despite being small enough to be deployed on a phone. The innovation lies entirely in our dataset for training, a scaled-up version of the one used for phi-2, composed of heavily filtered web data and synthetic data. The model is also further aligned for robustness, safety, and chat format. We also provide some initial parameter-scaling results with a 7B and 14B models trained for 4.8T tokens, called phi-3-small and phi-3-medium, both significantly more capable than phi-3-mini (e.g., respectively 75% and 78% on MMLU, and 8.7 and 8.9 on MT-bench).

논문 링크

https://arxiv.org/abs/2404.14219

더 읽어보기

https://discuss.pytorch.kr/t/…

https://x.com/omarsar0/status/1782780923806699716

OpenELM: 오픈 소스 학습 및 추론 프레임워크가 포함된 효율적인 언어 모델 제품군 / OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

논문 소개

계층별 확장 전략을 사용하여 파라미터를 효율적으로 할당하고 효율성과 정확도를 향상시키는 새로운 개방형 언어 모델로, 270M, 450M, 1.1B, 3B 등 다양한 크기로 제공되며, 사전 학습 토큰이 2배 더 적게 필요하면서 OLMo 대비 정확도가 2.36% 향상됩니다.

A new open language model that employs a layer-wise scaling strategy to efficiently allocate parameters and leading to better efficiency and accuracy; comes with different sizes such as 270M, 450M, 1.1B, and 3B; achieves a 2.36% improvement in accuracy compared to OLMo while requiring 2× fewer pre-training tokens.

논문 초록(Abstract)

대규모 언어 모델의 재현성과 투명성은 공개 연구를 발전시키고, 결과의 신뢰성을 보장하며, 데이터와 모델 편향성 및 잠재적 위험에 대한 조사를 가능하게 하는 데 매우 중요합니다. 이를 위해 최신 개방형 언어 모델인 OpenELM을 출시합니다. OpenELM은 계층별 확장 전략을 사용하여 트랜스포머 모델의 각 계층 내에서 파라미터를 효율적으로 할당함으로써 정확도를 향상시킵니다. 예를 들어, 매개변수 예산이 약 10억 개인 경우 OpenELM은 OLMo에 비해 정확도가 2.36% 향상되는 동시에 사전 학습 토큰이 2\배 더 적게 필요합니다. 모델 가중치와 추론 코드만 제공하고 비공개 데이터 세트에 대한 사전 학습만 제공하던 이전 사례와 달리, 이번 릴리스에는 학습 로그, 다중 체크포인트, 사전 학습 구성을 포함해 공개적으로 사용 가능한 데이터 세트에서 언어 모델을 학습하고 평가하기 위한 전체 프레임워크가 포함되어 있습니다. 또한 Apple 기기에서 추론 및 미세 조정을 위해 모델을 MLX 라이브러리로 변환하는 코드도 릴리스합니다. 이 포괄적인 릴리스는 오픈 리서치 커뮤니티에 힘을 실어주고 강화하여 향후 오픈 리서치를 위한 기반을 마련하는 것을 목표로 합니다. 사전 훈련된 모델 가중치 및 훈련 레시피와 함께 소스 코드는 \url{https://github.com/apple/corenet}에서 확인할 수 있습니다. 또한, \모델 모델은 HuggingFace에서 찾을 수 있습니다: \url{https://huggingface.co/apple/OpenELM}.

The reproducibility and transparency of large language models are crucial for advancing open research, ensuring the trustworthiness of results, and enabling investigations into data and model biases, as well as potential risks. To this end, we release OpenELM, a state-of-the-art open language model. OpenELM uses a layer-wise scaling strategy to efficiently allocate parameters within each layer of the transformer model, leading to enhanced accuracy. For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring $2\times$ fewer pre-training tokens. Diverging from prior practices that only provide model weights and inference code, and pre-train on private datasets, our release includes the complete framework for training and evaluation of the language model on publicly available datasets, including training logs, multiple checkpoints, and pre-training configurations. We also release code to convert models to MLX library for inference and fine-tuning on Apple devices. This comprehensive release aims to empower and strengthen the open research community, paving the way for future open research endeavors. Our source code along with pre-trained model weights and training recipes is available at \url{https://github.com/apple/corenet}. Additionally, \model models can be found on HuggingFace at: \url{https://huggingface.co/apple/OpenELM}.

논문 링크

https://arxiv.org/abs/2404.14619

더 읽어보기

https://discuss.pytorch.kr/t/apple-270m-3b-openelm/4204

https://github.com/apple/corenet

https://huggingface.co/apple/OpenELM

https://x.com/rasbt/status/1783480053847736713

Snowflake Arctic

논문 소개

고유한 Dense-MoE 하이브리드 트랜스포머 아키텍처를 사용하는 오픈 소스 LLM(Apache 2.0 라이선스.)으로, 코딩(HumanEval+ 및 MBPP+), SQL(Spider), 명령어 추종(IFEval) 등의 엔터프라이즈 지표에서 Llama3 70B와 동등한 성능, Llama3 70B보다 17배 적은 컴퓨팅 예산을 사용한다고 주장하며 훈련 컴퓨팅은 약 2백만 달러 미만(3K GPU 주 미만)입니다.

An open-source LLM (Apache 2.0 license.) that uses a unique Dense-MoE Hybrid transformer architecture; performs on par with Llama3 70B in enterprise metrics like coding (HumanEval+ & MBPP+), SQL (Spider) and instruction following (IFEval); claims to use 17x less compute budget than Llama 3 70B; the training compute is roughly under $2 million (less than 3K GPU weeks).

논문 링크

https://snowflake.com/blog/…

더 읽어보기

https://discuss.pytorch.kr/t/…

https://x.com/omarsar0/status/1783176059694821632

컨텍스트를 최대한 활용하는 LLM 만들기 / Make Your LLM Fully Utilize the Context

논문 소개

LLM에서 흔히 발생하는 중도 포기 문제를 극복하기 위한 접근 방식을 제시합니다. Mistral-7B에 명시적인 '정보 집약적' 훈련 절차를 적용하여 LLM이 문맥을 충분히 활용할 수 있도록 합니다. 이 모델은 1) 합성된 긴 컨텍스트(4K-32K 토큰) 내에서 짧은 세그먼트(∼128개)에 대한 세분화된 정보 인식과 2) 두 개 이상의 짧은 세그먼트의 정보를 통합하고 추론해야 하는 합성 데이터 세트를 활용합니다. 결과 모델인 FILM-7B(중간 채우기)는 32K 컨텍스트 창에서 서로 다른 위치의 정보를 안정적으로 검색할 수 있음을 보여줍니다.

Presents an approach to overcome the lost-in-the-middle challenge common in LLMs. It applies an explicit "information-intensive" training procedure on Mistral-7B to enable the LLM to fully utilize the context. It leverages a synthetic dataset where the answer requires fine-grained information awareness on a short segment (∼128 tokens) within a synthesized long context (4K−32K tokens), and 2) the integration and reasoning of information from two or more short segments. The resulting model, FILM-7B (Fill-in-the-Middle), shows that it can robustly retrieve information from different positions in its 32K context window.

논문 초록(Abstract)

현대의 많은 대규모 언어 모델(LLM)은 긴 입력을 처리할 수 있지만, 여전히 긴 문맥 내의 정보를 완전히 활용하는 데 어려움을 겪고 있으며, 이는 중간에서 길을 잃는 문제라고 알려져 있습니다. 우리는 이 문제가 긴 문맥 훈련 중 명시적인 감독이 불충분하여 긴 문맥의 모든 위치가 중요한 정보를 포함할 수 있다는 점을 강조하지 못하기 때문이라고 가설을 세웠습니다. 이러한 직관을 바탕으로 본 연구에서는 중간 손실 문제를 극복하기 위한 순수 데이터 기반 솔루션인 정보 집약적(IN2) 훈련을 제시합니다. 구체적으로 IN2 훈련은 합성된 긴 문맥의 질문-답변 데이터 세트를 활용하며, 여기서 답을 구하려면 (1) 합성된 긴 문맥(4K-32K 토큰) 내에서 짧은 세그먼트(~128개)에 대한 세분화된 정보 인식과 (2) 두 개 이상의 짧은 세그먼트에서 정보를 통합하고 추론하는 것이 필요합니다. 이러한 정보 집약적인 훈련을 미스트랄-7B에 적용하여 FILM-7B(FILl-in-the-Middle)를 선보입니다. 긴 컨텍스트를 활용하는 FILM-7B의 능력을 철저히 평가하기 위해 다양한 컨텍스트 스타일(문서, 코드, 구조화된 데이터 컨텍스트)과 정보 검색 패턴(정방향, 역방향, 양방향 검색)을 포괄하는 세 가지 프로빙 작업을 설계했습니다. 프로빙 결과는 FILM-7B가 32K 컨텍스트 창에서 다양한 위치의 정보를 안정적으로 검색할 수 있음을 보여줍니다. 이러한 프로빙 작업 외에도, FILM-7B는 실제 긴 컨텍스트 작업에서 성능을 크게 향상시키면서(예: NarrativeQA에서 F1 점수 23.5->26.9), 짧은 컨텍스트 작업에서도 비슷한 성능을 유지합니다(예: MMLU에서 59.3->59.2 정확도). 깃허브 링크: https://github.com/microsoft/FILM.

While many contemporary large language models (LLMs) can process lengthy input, they still struggle to fully utilize information within the long context, known as the lost-in-the-middle challenge. We hypothesize that it stems from insufficient explicit supervision during the long-context training, which fails to emphasize that any position in a long context can hold crucial information. Based on this intuition, our study presents information-intensive (IN2) training, a purely data-driven solution to overcome lost-in-the-middle. Specifically, IN2 training leverages a synthesized long-context question-answer dataset, where the answer requires (1) fine-grained information awareness on a short segment (~128 tokens) within a synthesized long context (4K-32K tokens), and (2) the integration and reasoning of information from two or more short segments. Through applying this information-intensive training on Mistral-7B, we present FILM-7B (FILl-in-the-Middle). To thoroughly assess the ability of FILM-7B for utilizing long contexts, we design three probing tasks that encompass various context styles (document, code, and structured-data context) and information retrieval patterns (forward, backward, and bi-directional retrieval). The probing results demonstrate that FILM-7B can robustly retrieve information from different positions in its 32K context window. Beyond these probing tasks, FILM-7B significantly improves the performance on real-world long-context tasks (e.g., 23.5->26.9 F1 score on NarrativeQA), while maintaining a comparable performance on short-context tasks (e.g., 59.3->59.2 accuracy on MMLU). Github Link: https://github.com/microsoft/FILM.

논문 링크

https://arxiv.org/abs/2404.16811

더 읽어보기

https://github.com/microsoft/FILM

https://x.com/omarsar0/status/1783905514578980949

FineWeb

논문 소개

언어 모델 학습을 위한 15조 개의 토큰이 포함된 대규모 웹 데이터 세트, 2013년부터 2024년까지 CommonCrawl을 필터링 및 중복 제거하여 데이터의 품질을 개선하는 것이 목표입니다.

A large-scale web dataset containing 15 trillion tokens for training language models; filters and deduplicates CommonCrawl between 2013 and 2024 and the goal is to improve the quality of the data.

논문 링크

https://huggingface.co/datasets/HuggingFaceFW/fineweb

더 읽어보기

https://x.com/gui_penedo/status/1781953413938557276

AI 기반 유전자 편집기 / AI-powered Gene Editors

논문 소개

대규모 생물학적 다양성에 대해 학습된 LLM으로 구동되는 AI 시스템으로 프로그래밍 가능한 유전자 편집기 설계를 통해 인간 게놈의 정밀 편집을 달성합니다.

Achieves precision editing of the human genome with a programmable gene editor design with an AI system powered by an LLM trained on biological diversity at scale.

논문 링크

https://www.biorxiv.org/content/10.1101/2024.04.22.590591v1

더 읽어보기

https://x.com/thisismadani/status/1782510590839406904

자동 크롤러: 웹 크롤러 생성을 위한 웹 에이전트에 대한 진보적인 이해 / AutoCrawler: A Progressive Understanding Web Agent for Web Crawler Generation

논문 소개

크롤러가 다양하고 변화하는 웹 환경을 보다 효율적으로 처리할 수 있도록 돕기 위해 LLM과 크롤러를 결합하고, 웹 크롤러 에이전트는 HTML의 계층 구조를 활용하여 점진적으로 이해하고, 하향식 및 단계적 작업을 사용하며, DOM 트리 구조를 활용하여 완전하고 실행 가능한 크롤러를 생성합니다.

Combines LLMs with crawlers with the goal of helping crawlers handle diverse and changing web environments more efficiently; the web crawler agent leverages the hierarchical structure of HTML for progressive understanding; employs top-down and step-back operations, and leverages the DOM tree structure, to generate a complete and executable crawler.

논문 초록(Abstract)

웹 자동화는 일반적인 웹 작업을 자동화하고 운영 효율성을 높이며 수동 개입의 필요성을 줄임으로써 복잡한 웹 작업을 수행하는 중요한 기술입니다. 래퍼와 같은 기존 방식은 새로운 웹사이트에 직면했을 때 적응성과 확장성에 한계가 있습니다. 반면에 대규모 언어 모델(LLM)로 구동되는 제너레이티브 에이전트는 오픈 월드 시나리오에서 성능과 재사용성이 떨어집니다. 본 연구에서는 수직 정보 웹 페이지에 대한 크롤러 생성 작업과 다양하고 변화하는 웹 환경을 크롤러가 보다 효율적으로 처리할 수 있도록 LLM과 크롤러를 결합하는 패러다임을 소개합니다. HTML의 계층 구조를 활용하여 점진적으로 이해하는 2단계 프레임워크인 오토크롤러를 제안합니다. 오토크롤러는 하향식 및 단계적 작업을 통해 잘못된 동작을 학습하고 더 나은 동작 생성을 위해 지속적으로 HTML을 정리할 수 있습니다. 여러 LLM으로 포괄적인 실험을 수행하여 프레임워크의 효과를 입증했습니다. 이 논문의 리소스는 \url{https://github.com/EZ-hwh/AutoCrawler}에서 확인할 수 있습니다

Web automation is a significant technique that accomplishes complicated web tasks by automating common web actions, enhancing operational efficiency, and reducing the need for manual intervention. Traditional methods, such as wrappers, suffer from limited adaptability and scalability when faced with a new website. On the other hand, generative agents empowered by large language models (LLMs) exhibit poor performance and reusability in open-world scenarios. In this work, we introduce a crawler generation task for vertical information web pages and the paradigm of combining LLMs with crawlers, which helps crawlers handle diverse and changing web environments more efficiently. We propose AutoCrawler, a two-stage framework that leverages the hierarchical structure of HTML for progressive understanding. Through top-down and step-back operations, AutoCrawler can learn from erroneous actions and continuously prune HTML for better action generation. We conduct comprehensive experiments with multiple LLMs and demonstrate the effectiveness of our framework. Resources of this paper can be found at \url{https://github.com/EZ-hwh/AutoCrawler}

논문 링크

https://arxiv.org/abs/2404.12753

더 읽어보기

https://github.com/EZ-hwh/AutoCrawler

https://x.com/omarsar0/status/1782462314983071757

대규모 언어 모델(LLM) 시대의 그래프 머신 러닝 / Graph Machine Learning in the Era of Large Language Models (LLMs)

논문 소개

그래프 ML의 최근 발전 사항, 그래프 기능을 향상시키는 방법, OOD 및 그래프 이질성과 같은 문제를 해결하는 방법 등 LLM 시대의 그래프 ML에 대한 최신 발전 사항을 포괄적으로 살펴봅니다.

Provides a comprehensive overview of the latest advancements for Graph ML in the era of LLMs; covers the recent developments in Graph ML, how LLM can enhance graph features, and how it can address issues such as OOD and graph heterogeneity.

논문 초록(Abstract)

그래프는 소셜 네트워크, 지식 그래프, 분자 발견과 같은 다양한 영역에서 복잡한 관계를 표현하는 데 중요한 역할을 합니다. 딥러닝의 출현과 함께 그래프 신경망(GNN)은 그래프 구조의 표현과 처리를 용이하게 하는 그래프 머신러닝(Graph ML)의 초석으로 떠올랐습니다. 최근 LLM은 언어 작업에서 전례 없는 능력을 보여주며 컴퓨터 비전 및 추천 시스템과 같은 다양한 애플리케이션에 널리 채택되고 있습니다. 이러한 괄목할 만한 성공은 그래프 영역에 LLM을 적용하는 데에도 관심을 불러일으켰습니다. 그래프 ML의 일반화, 전이성, 소수점 학습 능력을 발전시키는 데 있어 LLM의 잠재력을 탐구하려는 노력이 점점 더 많아지고 있습니다. 한편 그래프, 특히 지식 그래프는 신뢰할 수 있는 사실적 지식이 풍부하기 때문에 이를 활용하여 LLM의 추론 능력을 향상시키고 환각이나 설명력 부족과 같은 한계를 완화할 수 있습니다. 이러한 연구 방향의 빠른 진전을 고려할 때, 연구자 및 실무자에게 심도 있는 이해를 제공하기 위해 LLM 시대의 그래프 ML에 대한 최신 발전 사항을 정리한 체계적인 리뷰가 필요합니다. 따라서 이번 설문조사에서는 먼저 그래프 ML의 최근 발전 상황을 살펴봅니다. 그런 다음 그래프 특징의 품질을 향상시키고, 라벨링된 데이터에 대한 의존도를 완화하며, 그래프 이질성 및 분포 외 일반화(OOD) 같은 문제를 해결하기 위해 LLM을 어떻게 활용할 수 있는지 살펴봅니다. 그런 다음 그래프가 어떻게 LLM을 향상시킬 수 있는지 살펴보고, 그래프가 LLM 사전 학습과 추론을 향상시키는 기능을 강조합니다. 또한 다양한 응용 사례를 살펴보고 이 유망한 분야의 잠재적인 미래 방향에 대해 논의합니다.

Graphs play an important role in representing complex relationships in various domains like social networks, knowledge graphs, and molecular discovery. With the advent of deep learning, Graph Neural Networks (GNNs) have emerged as a cornerstone in Graph Machine Learning (Graph ML), facilitating the representation and processing of graph structures. Recently, LLMs have demonstrated unprecedented capabilities in language tasks and are widely adopted in a variety of applications such as computer vision and recommender systems. This remarkable success has also attracted interest in applying LLMs to the graph domain. Increasing efforts have been made to explore the potential of LLMs in advancing Graph ML's generalization, transferability, and few-shot learning ability. Meanwhile, graphs, especially knowledge graphs, are rich in reliable factual knowledge, which can be utilized to enhance the reasoning capabilities of LLMs and potentially alleviate their limitations such as hallucinations and the lack of explainability. Given the rapid progress of this research direction, a systematic review summarizing the latest advancements for Graph ML in the era of LLMs is necessary to provide an in-depth understanding to researchers and practitioners. Therefore, in this survey, we first review the recent developments in Graph ML. We then explore how LLMs can be utilized to enhance the quality of graph features, alleviate the reliance on labeled data, and address challenges such as graph heterogeneity and out-of-distribution (OOD) generalization. Afterward, we delve into how graphs can enhance LLMs, highlighting their abilities to enhance LLM pre-training and inference. Furthermore, we investigate various applications and discuss the potential future directions in this promising field.

논문 링크

https://arxiv.org/abs/2404.14928

더 읽어보기

https://x.com/omarsar0/status/1783171591020392886

대규모 언어 모델의 자기 진화에 관한 설문 조사 / A Survey on Self-Evolution of Large Language Models

논문 소개

LLM의 자기 진화 접근 방식에 대한 종합적인 설문조사를 제공합니다.

Provides a comprehensive survey on self-evolution approaches in LLMs.

논문 초록(Abstract)

대규모 언어 모델(LLM)은 다양한 분야와 지능형 에이전트 애플리케이션에서 크게 발전해 왔습니다. 그러나 사람이나 외부 모델의 감독을 통해 학습하는 현재의 LLM은 비용이 많이 들고 작업의 복잡성과 다양성이 증가함에 따라 성능 한계에 직면할 수 있습니다. 이 문제를 해결하기 위해 모델 자체에서 생성된 경험을 자율적으로 획득, 개선, 학습할 수 있는 자가 진화 접근 방식이 빠르게 성장하고 있습니다. 인간의 경험적 학습 과정에서 영감을 얻은 이 새로운 훈련 패러다임은 초지능을 향해 LLM을 확장할 수 있는 잠재력을 제공합니다. 이 글에서는 LLM의 자기 진화 접근 방식에 대한 포괄적인 조사를 소개합니다. 먼저 자기 진화에 대한 개념적 프레임워크를 제안하고 진화 과정을 경험 획득, 경험 개선, 업데이트, 평가의 네 단계로 구성된 반복 주기로 개괄적으로 설명합니다. 둘째, LLM과 LLM 기반 에이전트의 진화 목표를 분류한 다음, 문헌을 요약하고 각 모듈에 대한 분류법과 인사이트를 제공합니다. 마지막으로, 기존의 과제를 정확히 파악하고 자가 진화 프레임워크를 개선하기 위한 향후 방향을 제안하여 연구자들이 자가 진화하는 LLM의 개발을 빠르게 진행할 수 있는 중요한 인사이트를 제공합니다.

Large language models (LLMs) have significantly advanced in various fields and intelligent agent applications. However, current LLMs that learn from human or external model supervision are costly and may face performance ceilings as task complexity and diversity increase. To address this issue, self-evolution approaches that enable LLM to autonomously acquire, refine, and learn from experiences generated by the model itself are rapidly growing. This new training paradigm inspired by the human experiential learning process offers the potential to scale LLMs towards superintelligence. In this work, we present a comprehensive survey of self-evolution approaches in LLMs. We first propose a conceptual framework for self-evolution and outline the evolving process as iterative cycles composed of four phases: experience acquisition, experience refinement, updating, and evaluation. Second, we categorize the evolution objectives of LLMs and LLM-based agents; then, we summarize the literature and provide taxonomy and insights for each module. Lastly, we pinpoint existing challenges and propose future directions to improve self-evolution frameworks, equipping researchers with critical insights to fast-track the development of self-evolving LLMs.

논문 링크

https://arxiv.org/abs/2404.14387

더 읽어보기

https://x.com/omarsar0/status/1782777977526231440

NExT: 대규모 언어 모델에 코드 실행에 대한 추론 교육하기 / NExT: Teaching Large Language Models to Reason about Code Execution

논문 소개

프로그램의 실행 추적을 검사하고 합성 사고 연쇄를 통해 런타임 동작을 추론할 수 있는 능력을 갖추도록 LLM을 훈련하고, MBPP와 Human에서 PaLM 2 모델의 수정률을 26.1%와 14.3% 향상시키고, 알 수 없는 시나리오에 대해서도 일반화할 수 있음을 보여줍니다.

Trains an LLM to have the ability to inspect the execution traced of programs and reason about run-time behavior via synthetic chain-of-thought rationales; improves the fix rate of a PaLM 2 model on MBPP and Human by 26.1% and 14.3%; the model also shows that it can generalize to unknown scenarios.

논문 초록(Abstract)

인간 개발자의 기본 능력은 프로그램 실행을 이해하고 추론하는 능력입니다. 예를 들어 프로그래머는 자연어로 코드 실행을 정신적으로 시뮬레이션하여 코드를 디버깅하고 복구할 수 있습니다(일명 러버덕 디버깅). 그러나 코드의 대규모 언어 모델(LLM)은 일반적으로 프로그램의 표면 텍스트 형식에 대해 학습되기 때문에 런타임에 프로그램이 실행되는 방식에 대한 의미론적 이해가 부족할 수 있습니다. 이 문제를 해결하기 위해 저희는 LLM이 프로그램의 실행 추적(실행된 줄의 가변 상태)을 검사하고 생각의 연쇄(CoT) 논리를 통해 런타임 동작을 추론하도록 가르치는 방법인 NExT를 제안합니다. 특히, NExT는 자체 학습을 사용하여 실행 인식 추론의 합성 학습 집합을 부트스트랩하여 힘든 수동 주석 작업 없이 올바른 작업 솔루션(예: 고정 프로그램)으로 이어집니다. MBPP와 HumanEval을 기반으로 한 프로그램 복구 작업에 대한 실험 결과, NExT는 자동화된 메트릭과 인간 평가자에 의해 검증된 것처럼 PaLM 2 모델의 수정률을 각각 26.1%와 14.3% 절대적으로 향상시키며 근거 품질을 크게 개선하는 것으로 나타났습니다. 또한 이 모델은 테스트 시점에 프로그램 트레이스가 없는 시나리오에도 일반화할 수 있습니다.

A fundamental skill among human developers is the ability to understand and reason about program execution. As an example, a programmer can mentally simulate code execution in natural language to debug and repair code (aka. rubber duck debugging). However, large language models (LLMs) of code are typically trained on the surface textual form of programs, thus may lack a semantic understanding of how programs execute at run-time. To address this issue, we propose NExT, a method to teach LLMs to inspect the execution traces of programs (variable states of executed lines) and reason about their run-time behavior through chain-of-thought (CoT) rationales. Specifically, NExT uses self-training to bootstrap a synthetic training set of execution-aware rationales that lead to correct task solutions (e.g., fixed programs) without laborious manual annotation. Experiments on program repair tasks based on MBPP and HumanEval demonstrate that NExT improves the fix rate of a PaLM 2 model, by 26.1% and 14.3% absolute, respectively, with significantly improved rationale quality as verified by automated metrics and human raters. Our model can also generalize to scenarios where program traces are absent at test-time.

이 글은 GPT 모델로 정리한 것으로, 잘못된 부분이 있을 수 있으니 글 아래쪽의 원문도 함께 참고해주세요! 읽으시면서 어색하거나 잘못된 내용을 발견하시면 덧글로 알려주시기를 부탁드립니다.

⚠️광고⚠️: 파이토치 한국 사용자 모임이 정리한 이 글이 유용하셨나요? 회원으로 가입하시면 주요 글들을 이메일로 보내드립니다! (기본은 Weekly지만 Daily로 변경도 가능합니다.)