# Apple이 Multimodal LLM인 MM1에 대해 공개한 논문을 정리해 보았습니다.

> Clean Markdown view of GeekNews topic #13842. Use the original source for factual precision when an external source URL is present.

## Metadata

- GeekNews HTML: [https://news.hada.io/topic?id=13842](https://news.hada.io/topic?id=13842)
- GeekNews Markdown: [https://news.hada.io/topic/13842.md](https://news.hada.io/topic/13842.md)
- Type: news
- Author: [ninebow](https://news.hada.io/@ninebow)
- Published: 2024-03-16T20:12:15+09:00
- Updated: 2024-03-16T20:12:15+09:00
- Original source: [discuss.pytorch.kr](https://discuss.pytorch.kr/t/apple-llm-mm1-x/3772?utm_source=geeknews)
- Points: 6
- Comments: 0

## Topic Body

Apple에서 MM1이라는 멀티모달 LLM에 대한 연구 결과를 공개하였습니다. (모델 코드나 가중치는 공개하지 않았고, 앞으로도 안 할 것 같습니다)  
  
Image Encoder와 VL-Connector, 그리고 데이터셋과 학습 방법 등에서 모델을 직접 학습하시거나 튜닝하시는 분들께서는 한 번쯤 살펴보셔도 좋을 것 같아 ChatGPT와 함께 정리한 내용을 공유합니다.  
  
원문은 [arXiv 사이트에서 'MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training' 이라는 확인](https://arxiv.org/abs/2403.09611)하실 수 있습니다.  
  
---  
  
인코더 레슨: 이미지 해상도가 가장 큰 영향을 미치며, 모델 크기와 학습 데이터 구성이 그 뒤를 따릅니다.  
> Encoder lesson: Image resolution has the highest impact, followed by model size and training data composition.  
  
VL 커넥터 레슨: 비주얼 토큰의 수와 이미지 해상도가 가장 중요하며, VL 커넥터 유형은 거의 영향을 미치지 않습니다.  
> VL Connector Lesson: Number of visual tokens and image resolution matters most, while the type of VL connector has little effect.  
  
데이터 레슨 1: 인터리브 데이터는 적은 수의 샷과 텍스트 전용 성능에 도움이 되고, 캡션 데이터는 제로-샷 성능을 향상시킵니다.  
> Data lesson 1: interleaved data is instrumental for few-shot and textonly performance, while captioning data lifts zero-shot performance.  
  
데이터 레슨 2: 텍스트 전용 데이터는 퓨-샷 및 텍스트 전용 성능에 도움이 됩니다.  
> Data lesson 2: text-only data helps with few-shot and text-only performance.  
  
데이터 레슨 3: 이미지 데이터와 텍스트 데이터를 신중하게 혼합하면 최적의 멀티모달 성능을 얻을 수 있고 강력한 텍스트 성능을 유지할 수 있습니다.  
> Data lesson 3: Careful mixture of image and text data can yield optimal multimodal performance and retain strong text performance.  
  
데이터 레슨 4: 합성 데이터는 퓨-샷 학습에 도움이 됩니다.  
> Data lesson 4: Synthetic data helps with few-shot learning.

## Comments


_No public comments on this page._