| Kun Yuan

Kun Yuan (袁坤)
Assistant Professor, Center for Machine Learning Research, Peking University
Email: kunyuan AT pku.edu.cn
CV / Google scholar

I am an Assistant Professor at Center for Machine Learning Research (CMLR) in Peking University.

My research lies in the theoretical and algorithmic foundations in optimization, signal processing, machine learning, and data science. I currently focus on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of things.

Before joining Peking University, I was a staff algorithm engineer in the Decision Intelligence Lab in Alibaba (US) Group led by Prof. Wotao Yin. I completed my Ph.D. in Electrical and Computer Engineering from University of California, Los Angeles (UCLA) in 2019 under the supervision of Prof. Ali H. Sayed. I was a visiting researcher at École Polytechnique Fédérale de Lausanne (EPFL) from 2018 January to 2018 June, and a research intern at Microsoft Research Redmond from 2018 June to 2018 October.

I was the recipient of the 2017 IEEE Signal Processing Society Young Author Best Paper Award (joint with Dr. Wei Shi), and the 2017 ICCM Distinguished Paper Award.

We are hiring PostDocs and Undergraduate Research Interns!!!

Drop me an email if you are interested in machine learning, optimization, and AI systems.

News

[07/2025] Our new paper, Greedy Low-Rank Gradient Compression for Distributed Learning with Convergence Guarantees, is now available on arXiv. In this work, we propose GreedyLore—the first greedy low-rank gradient compression algorithm for distributed learning with rigorous convergence guarantees.
[06/2025] Our new paper, On the Linear Speedup of the Push-Pull Method for Decentralized Optimization over Digraphs, is now available on arXiv. In this work, we establish for the first time that the Push-Pull method can achieve linear speedup in iteration complexity over directed graphs.
[05/2025] The following papers are accepted to ICML 2025. Congratulations to all collaborators and students.
- Efficient Multi-Objective Learning under Preference Guidance: A First-Order Penalty Approach [Spotlight]
- Achieving Linear Speedup and Optimal Complexity for Decentralized Optimization over Row-stochastic Networks [Spotlight]
- Subspace Optimization for Large Language Models with Convergence Guarantees
- A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models
- Distributed Retraction-Free and Communication-Efficient Optimization on the Stiefel Manifold
[04/2025] Our paper Understanding the Influence of Digraphs on Decentralized Optimization: Effective Metrics, Lower Bound, and Optimal Algorithm is accepted by SIAM Journal on Optimization.
[02/2025] I will teach a course on Introduction to Foundation Models in Spring 2025.
[02/2025] I will server as an Area Chair for NeurIPS 2025.
[02/2025] A new paper CE-LoRA: Computation-Efficient LoRA Fine-Tuning for Language Models is now available on arXiv. In this work, we propose a computation-efficient LoRA variant to enhance computational efficiency while preserving memory efficiency.
[02/2025] A new paper A Memory Efficient Randomized Subspace Optimization Method for Training Large Language Models is now available on arXiv. In this work, we propose a Randomized Subspace Optimization framework for pre-training and fine-tuning LLMs with strong convergence guarantees.
[01/2025] One paper is accepted to ICLR 2025. Congratulations to all collaborators!
- Enhancing Zeroth-Order Fine-Tuning for Language Models with Low-Rank Structures
[11/2024] I will server as an Area Chair for ICML 2025.
[10/2024] A new paper Subspace Optimization for Large Language Models with Convergence Guarantees is now available on arXiv. In this paper, we unexpectedly discover that GaLore does not always converge to the optimal solution and substantiate this finding with an explicit counterexample. We further propose a novel variant of GaLore that provably converges in stochastic optimization.
[10/2024] A new paper Enhancing Zeroth-Order Fine-Tuning for Language Models with Low-Rank Structures is now available on arXiv. In this work, we propose a low-rank zeroth-order gradient estimator and introduces a novel low-rank ZO algorithm to effectively fine-tune LLMs. It outperforms MeZO significantly.
[10/2024] A new paper A Mathematics-Inspired Learning-to-Optimize Framework for Decentralized Optimization is now available on arXiv. In this work, we present the first learning-to-optimize framework that surpasses state-of-the-art hand-crafted decentralized algorithms.
[09/2024] One paper is accepted to NeurIPS 2024. Congratulations to my student Shuchen Zhu, Boao Kong, and all collaborators!
- SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization
[09/2024] I will be teaching a course on Optimization for Deep Learning in 2024 Fall.
[06/2024] I will give a 3-hour tutorial on Efficient Optimization for Deep Learning at Fudan University on June 8th. Please check the slides.
[05/2024] I will teach a short summer course titled Efficient Optimization for Large Language Models at Beijing Jiaotong University from July 7th to July 9th. It will be a condensed mix of my two regular classes Optimization for Deep Learning and Large Language Models in Decision Intelligence. The syllabus is coming soon.
[05/2024] One paper is accepted to ICML 2024. Congratulations to my student Yutong He, Jie Hu, and all collaborators!
- Distributed Bilevel Optimization with Communication Compression
[04/2024] My undergraduate students, Ziheng Cheng and Liyuan Liang, have been admitted to the UC Berkeley PhD Program. Additionally, Lujing Zhang has been admitted to the Carnegie Mellon University (CMU) PhD Program. Congratulations to all of them! We are currently hiring undergraduate research interns. We are committed to providing abundant resources and comprehensive instructions to support their involvement in cutting-edge research projects.
[04/2024] I will give a talk on Asynchronous Diffusion Learning with Agent Subsampling and Local Updates at IEEE ICASSP 2024.
[03/2024] I will server as an Area Chair for NeurIPS 2024.
[03/2024] I will give a tutorial lecture on Distributed Machine Learning at MLSS 2024. Lecture slides can be found at Distributed Machine Learning: Part I and Distributed Machine Learning: Part II.
[02/2024] I will be teaching a course on Large Language Models in Decision Intelligence in 2024 Spring.
[02/2024] A new paper Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity is on arXiv now. We have calrified the joint influence of network topology and data heterogeneity on decentralized bilevel optimization.
[01/2024] One paper is accepted to ICLR 2024. Congratulations to my student Ziheng Cheng and all collaborators!
- Momentum Benefits Non-IID Federated Learning Simply and Provably
[12/2023] A new paper Towards Better Understanding the Influence of Directed Networks on Decentralized Stochastic Optimization is on arXiv now. Surprisingly, we find that spectral gap is not enough to capture the influence of directed networks and the equilibrium skewness matters a lot! We also establish the lower bound for decentralized algorithms with clomun-stochastic mixing matrices.
[11/2023] We will organize a session on Decentralized Optimization and Learning in IEEE CDC 2023.
[11/2023] One paper is accepted by Signal Processing.
- An Enhanced Gradient-Tracking Bound for Distributed Online Stochastic Convex Optimization
[09/2023] A new paper Sharper Convergence Guarantees for Federated Learning with Partial Model Personalization is on arXiv now. We establish new state-of-the-art convergence rates for federated learning with partial model personalization!
[09/2023] One paper is accepted to NeurIPS 2023.
- Unbiased Compression Saves Communication in Distributed Optimization: When and How Much?
Congratulations to my student Yutong He on publishing his first paper!
[09/2023] I will be teaching a course on Optimization for Deep Learning in 2023 Fall.
[09/2023] One paper is accepted by Journal of Machine Learning Research (JMLR).
- Removing Data Heterogeneity Influence Enhances Network Topology Dependence of Decentralized SGD
[07/2023] Two papers are accepted to IEEE CDC 2023.
- Achieving Linear Speedup with Network-Independent Learning Rates in Decentralized Stochastic Optimization
- On the Performance of Gradient Tracking with Local Updates
Congratulations to my student Hao Yuan and my collaborator Edward Nguyen on publishing their first papers!
[06/2023] A new paper Momentum Benefits Non-IID Federated Learning Simply and Provably is on arXiv now. An interesting message is that FedAvg can converge without data heterogeneity assumption when incorporating momentum!
[05/2023] A new paper Unbiased Compression Saves Communication in Distributed Optimization: When and How Much? is on arXiv now.
[05/2023] A new paper Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression is on arXiv now. Please also check Slides (on Github) or Slides (on Baidu Wangpan) for this paper. Some preliminary results of this paper have been published in NeurIPS 2022, check this paper.
[04/2023] Two papers are accepted to ICML 2023.
- DSGD-CECA: Decentralized SGD with Communication-Optimal Exact Consensus Algorithm
- AdaNPC: Exploring Non-Parametric Classifier for Test-Time Adaptation
[02/2023] One paper BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection is accepted to CVPR 2023.
[01/2023] A new paper An Enhanced Gradient-Tracking Bound for Distributed Online Stochastic Convex Optimization is on arXiv now. We establish enhanced rates for Gradient Tracking methods under the online stochastic convex settings.
[11/2022] I gave a talk in BICMR on Accelerating Decentralized SGD with Sparse and Effecitve Topologies, which includes our rescent results on Exponential Graphs, EquiToPo Graphs, and BlueFog. Please check Slides (on Github) or Slides (on Baidu Wangpan).
[11/2022] We hosted 2022 PKU Workshop on Operations Research and Machine Learning online on Nov. 21 and Nov. 22. I gave a talk on DecentLaM: Decentralized Momentum SGD for Large-Batch Deep Training. Please check Slides (on Github) or Slides (on Baidu Wangpan).

[09/2022] Three papers are accepted to NeurIPS 2022.
[09/2022] Prof. Wotao Yin was invited to give a keynote talk on our recent work Lower Bounds and Nearly Optimal Algorithms in Distributed Learning with Communication Compression in the CrossFL 2022 workshop. Please check the slides and the Youtube video.