PKU Class 2024 Spring: Introduction to Large Language Models
Instructor: Kun Yuan (kunyuan@pku.edu.cn)
Sponsor: Decision Intelligence Team, Alibaba DAMO Academy
Teaching assistants:
- Yudong Bai (yutonghe@pku.edu.cn)
- Yunteng Geng (2301213081@pku.edu.cn)
- Yutong He (yutonghe@pku.edu.cn)
- Peijin Li (2301213056@stu.pku.edu.cn)
- Zihao Liu (2100011704@stu.pku.edu.cn)
- Keer Lu (2301213094@stu.pku.edu.cn)
- Yilong Song (2301213059@pku.edu.cn)
- Qianyou Sun (2301111049@stu.pku.edu.cn)
- Yuchi Wang (wangyuchi@stu.pku.edu.cn)
Office hour: 4pm - 5pm Wednesday, 静园六院220
References
Stanford CS224n: Natural Language Processing with Deep Learning
Lectrures
Lecture 1: Introduction to LLM
- Introduction to large language model [Slides]
- Reading:
Lecture 2: Linear algebra and optimization
Lecture 3: Basics in machine learning
- Linear regression; Logistic regression; Multi-classification; Neural network [Slides]
- Reading:
Lecture 4: Word embedding and language models
- Word embedding; [Slides]
- Language models; Recurrent neural network; (Slides are adapted from Stanford CS224n RNN)
- Back propogation in RNN [Slides]
- Sequence-to-sequence model (Slides are adapted from Stanford CS224n Seq2Seq)
- Forward-Backward propogation [Hand-written materials]
- Transformers (Slides are adapted from Stanford CS224n Transformers)
- Parameters and Computations in Transformers [Slides]
- Reading:
Guest Lecture I:
- Large language model in mathematical reasoning (Dr. Jihai Zhang, Alibaba DAMO Academcy)
Lecture 6: Pretrain and Fine-tune Paradigm
- Teacher forcing; Pretrain; Fine-tune; BERT; GPTs [Slides]
- Reading:
Lecture 7: Optimizers
Midterm Exam
Lecture 8: Distributed Training
- Scaling law [Slides]
- Data parallelism and communication saving; Pipeline parallelism; Tensor parallelism [Slides]
- Reading:
Lecture 9: Data Prepation
- Data source; Deduplication; Quality filtering; Sensitive information reduction; Data composition; Data curriculum [Slides]
- Reading:
- Penedo, Guilherme, et al., The Refined Web Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only
- Soldaini, Luca, et al., Dolma: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
- Kandpal, Nikhil, Eric Wallace, and Colin Raffel., Deduplicating Training Data Mitigates Privacy Risks in Language Models
- Xie, Sang Michael, et al., Data Selection for Language Models via Importance Resampling
- Chen, Mayee, et al., Skill-it! A data-driven skills framework for understanding and training language models
Lecture 10: Principals in Prompt Engineering
Lecture 11: LLM Based Agents
Guest Lecture II:
- Building Brainiac Buddy with LLM Agents (Zihao Liu, Beijing International Center for Mathematical Research)
Guest Lecture III:
- Building LLM Agents with Alibaba MindOpt Studio (Dr. Jianfeng Yang, Alibaba DAMO Academcy)
Lecture 12: Retrieval Augmented Generation
Lecture 13: Parameter-Efficient Fine-Tuning
- Low-Rank adaptation (LoRA); LoRA+; DoRA; LISA [Slides]
- Reading:
Lecture 14: Alignment
Final Review