Lectures 2026: Optimization for Large Language Models
Instructor: Kun Yuan (kunyuan@pku.edu.cn)
This is a 10-hour intensive course on Optimization for Large Language Models. I would like to express my sincere gratitude to the Operations Research Society of China for the invitation and excellent organization.
Classroom: To be announced
Time: To be annouced
References
Martin Jaggi and Nicolas Flammarion, Optimization for Machine Learning, EPFL Class CS-439
Chris De Sa, Advanced Machine Learning Systems, Cornell CS6787
Zaiwen Wen, Optimization Methods, PKU 2024 Fall
Kun Yuan, Introduction to LLM, PKU 2025 Spring
Materials
Lecture 1: Gradient Descent
Lecture 2: Stochastic Gradient Descent
Lecture 3: LLM Foundations - Part I
Lecture 4: LLM Foundations - Part II
Lecture 5: Costs in LLM Pre-Training
Lecture 6: Perturbed SGD and Mixed-Precision Training
- Perturbed SGD; Mixed-Precision Training [Slides]
Lecture 7: Coordinate Descent and Layer-wise Training
Lecture 8: Subspace Optimization and Low-Rank Training
- Subspace Optimization; Low-rank gradient projection
Lecture 9: Zeroth-order Optimization and Activation-Free LLM Training
Lecture 10: Distributed Optimization for LLM Training