Publications

You can also find my articles on my Google Scholar profile.

Conference Papers


Rethinking Learning Rate Tuning in the Era of Large Language Models

Published in 2023 IEEE 5th International Conference on Cognitive Machine Intelligence (CogMI), 2023

This paper explores the challenges of learning rate tuning for Large Language Models (LLMs) and introduces LRBench++ for benchmarking.

Recommended citation: Jin, H., Wei, W., Wang, X., Zhang, W., & Wu, Y. (2023). "Rethinking Learning Rate Tuning in the Era of Large Language Models." 2023 IEEE 5th International Conference on Cognitive Machine Intelligence (CogMI), 112-121.
Download Paper

Preprints


DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models

Published in arXiv Preprint, 2024

This paper proposes DA-MoE, a novel dynamic router mechanism for Mixture-of-Experts (MoE) models, enabling efficient expert allocation based on token importance.

Recommended citation: Aghdam, M. A., Jin, H., & Wu, Y. (2024). "DA-MoE: Towards Dynamic Expert Allocation for Mixture-of-Experts Models." arXiv Preprint. arXiv:2409.06669.
Download Paper