Assistant Professor, Center for Machine Learning Research, Peking University
My research lies in the theoretical and algorithmic foundations in optimization, signal processing, machine learning, and data science. I currently focus on the development of fast, scalable, reliable, and distributed algorithms with applications in large-scale optimization, deep neural network training, federated learning, and Internet of things.
-
[10/2024] A new paper Subspace Optimization for Large Language Models with Convergence Guarantees is now available on arXiv. In this paper, we unexpectedly discover that GaLore does not always converge to the optimal solution and substantiate this finding with an explicit counterexample. We further propose a novel variant of GaLore that provably converges in stochastic optimization.
-
[10/2024] A new paper Enhancing Zeroth-Order Fine-Tuning for Language Models with Low-Rank Structures is now available on arXiv. In this work, we propose a low-rank zeroth-order gradient estimator and introduces a novel low-rank ZO algorithm to effectively fine-tune LLMs. It outperforms MeZO significantly.
-
[10/2024] A new paper A Mathematics-Inspired Learning-to-Optimize Framework for Decentralized Optimization is now available on arXiv. In this work, we present the first learning-to-optimize framework that surpasses state-of-the-art hand-crafted decentralized algorithms.
- [09/2024] One paper is accepted to NeurIPS 2024. Congratulations to my student Shuchen Zhu, Boao Kong, and all collaborators!
- SPARKLE: A Unified Single-Loop Primal-Dual Framework for Decentralized Bilevel Optimization
-
[09/2024] I will be teaching a course on Optimization for Deep Learning in 2024 Fall.
-
[06/2024] I will give a 3-hour tutorial on Efficient Optimization for Deep Learning at Fudan University on June 8th. Please check the slides.
-
[05/2024] I will teach a short summer course titled Efficient Optimization for Large Language Models at Beijing Jiaotong University from July 7th to July 9th. It will be a condensed mix of my two regular classes Optimization for Deep Learning and Large Language Models in Decision Intelligence. The syllabus is coming soon.
- [05/2024] One paper is accepted to ICML 2024. Congratulations to my student Yutong He, Jie Hu, and all collaborators!
-
[04/2024] My undergraduate students, Ziheng Cheng and Liyuan Liang, have been admitted to the UC Berkeley PhD Program. Additionally, Lujing Zhang has been admitted to the Carnegie Mellon University (CMU) PhD Program. Congratulations to all of them! We are currently hiring undergraduate research interns. We are committed to providing abundant resources and comprehensive instructions to support their involvement in cutting-edge research projects.
-
[04/2024] I will give a talk on Asynchronous Diffusion Learning with Agent Subsampling and Local Updates at IEEE ICASSP 2024.
-
[03/2024] I will server as an Area Chair for NeurIPS 2024.
-
[03/2024] I will give a tutorial lecture on Distributed Machine Learning at MLSS 2024. Lecture slides can be found at Distributed Machine Learning: Part I and Distributed Machine Learning: Part II.
-
[02/2024] I will be teaching a course on Large Language Models in Decision Intelligence in 2024 Spring.
-
[02/2024] A new paper Decentralized Bilevel Optimization over Graphs: Loopless Algorithmic Update and Transient Iteration Complexity is on arXiv now. We have calrified the joint influence of network topology and data heterogeneity on decentralized bilevel optimization.
- [01/2024] One paper is accepted to ICLR 2024. Congratulations to my student Ziheng Cheng and all collaborators!
-
[12/2023] A new paper Towards Better Understanding the Influence of Directed Networks on Decentralized Stochastic Optimization is on arXiv now. Surprisingly, we find that spectral gap is not enough to capture the influence of directed networks and the equilibrium skewness matters a lot! We also establish the lower bound for decentralized algorithms with clomun-stochastic mixing matrices.
-
[11/2023] We will organize a session on Decentralized Optimization and Learning in IEEE CDC 2023.
- [11/2023] One paper is accepted by Signal Processing.
-
[09/2023] A new paper Sharper Convergence Guarantees for Federated Learning with Partial Model Personalization is on arXiv now. We establish new state-of-the-art convergence rates for federated learning with partial model personalization!
- [09/2023] One paper is accepted to NeurIPS 2023.
Congratulations to my student Yutong He on publishing his first paper!
-
[09/2023] I will be teaching a course on Optimization for Deep Learning in 2023 Fall.
- [09/2023] One paper is accepted by Journal of Machine Learning Research (JMLR).
- [07/2023] Two papers are accepted to IEEE CDC 2023.
Congratulations to my student Hao Yuan and my collaborator Edward Nguyen on publishing their first papers!
-
[06/2023] A new paper Momentum Benefits Non-IID Federated Learning Simply and Provably is on arXiv now. An interesting message is that FedAvg can converge without data heterogeneity assumption when incorporating momentum!
-
[05/2023] A new paper Unbiased Compression Saves Communication in Distributed Optimization: When and How Much? is on arXiv now.
-
[05/2023] A new paper Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression is on arXiv now. Please also check Slides (on Github) or Slides (on Baidu Wangpan) for this paper. Some preliminary results of this paper have been published in NeurIPS 2022, check this paper.
-
[04/2023] Two papers are accepted to ICML 2023.
-
[02/2023] One paper BEVHeight: A Robust Framework for Vision-based Roadside 3D Object Detection is accepted to CVPR 2023.
-
[01/2023] A new paper An Enhanced Gradient-Tracking Bound for Distributed Online Stochastic Convex Optimization is on arXiv now. We establish enhanced rates for Gradient Tracking methods under the online stochastic convex settings.
-
[11/2022] I gave a talk in BICMR on Accelerating Decentralized SGD with Sparse and Effecitve Topologies, which includes our rescent results on Exponential Graphs, EquiToPo Graphs, and BlueFog. Please check Slides (on Github) or Slides (on Baidu Wangpan).
- [11/2022] We hosted 2022 PKU Workshop on Operations Research and Machine Learning online on Nov. 21 and Nov. 22. I gave a talk on DecentLaM: Decentralized Momentum SGD for Large-Batch Deep Training. Please check Slides (on Github) or Slides (on Baidu Wangpan).