2P by xguru 2020-02-12 | favorite | 댓글과 토론

- Transformer 기반
- BERT-Large 340M, RoBERTa 355M, OpenAI GPT-2 1.5b 등에 비해 10배 이상의 파라미터를 가진 최대크기 모델
- DeepSpeed 와 ZeRO가 있었기에 가능
소개 페이지의 요약문 자체도 Turing-NLG에 의해서 만들어졌다고
"Turing Natural Language Generation (T-NLG) is a 17 billion parameter language model by Microsoft that outperforms the state of the art on many downstream NLP tasks. We present a demo of the model, including its freeform generation, question answering, and summarization capabilities, to academics for feedback and research purposes."