AITopics | Shen, Min

Collaborating Authors

Shen, Min

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Every Sample Matters: Leveraging Mixture-of-Experts and High-Quality Data for Efficient and Accurate Code LLM

Codefuse, null, Team, Ling, :, null, Cai, Wenting, Cao, Yuchen, Chen, Chaoyu, Chen, Chen, Chen, Siba, Cui, Qing, Di, Peng, Fang, Junpeng, Gong, Zi, Guo, Ting, He, Zhengyu, Huang, Yang, Li, Cong, Li, Jianguo, Li, Zheng, Lian, Shijie, Liu, BingChang, Luo, Songshan, Mao, Shuo, Shen, Min, Wu, Jian, Yang, Jiaolong, Yang, Wenjie, Ye, Tong, Yu, Hang, Zhang, Wei, Zhang, Zhenduo, Zhao, Hailin, Zheng, Xunjin, Zhou, Jun

arXiv.org Artificial IntelligenceMar-22-2025

Recent advancements in code large language models (LLMs) have demonstrated remarkable capabilities in code generation and understanding. It is still challenging to build a code LLM with comprehensive performance yet ultimate efficiency. Many attempts have been released in the open source community to break the trade-off between performance and efficiency, such as the Qwen Coder series and the DeepSeek Coder series. This paper introduces yet another attempt in this area, namely Ling-Coder-Lite. We leverage the efficient Mixture-of-Experts (MoE) architecture along with a set of high-quality data curation methods (especially those based on program analytics) to build an efficient yet powerful code LLM. Ling-Coder-Lite exhibits on-par performance on 12 representative coding benchmarks compared to state-of-the-art models of similar size, such as Qwen2.5-Coder-7B and DeepSeek-Coder-V2-Lite, while offering competitive latency and throughput. In practice, we achieve a 50\% reduction in deployment resources compared to the similar-sized dense model without performance loss. To facilitate further research and development in this area, we open-source our models as well as a substantial portion of high-quality data for the annealing and post-training stages. The models and data can be accessed at~\url{https://huggingface.co/inclusionAI/Ling-Coder-lite}.

large language model, machine learning, qwen2, (17 more...)

arXiv.org Artificial Intelligence

2503.17793

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model

Di, Peng, Li, Jianguo, Yu, Hang, Jiang, Wei, Cai, Wenting, Cao, Yang, Chen, Chaoyu, Chen, Dajun, Chen, Hongwei, Chen, Liang, Fan, Gang, Gong, Jie, Gong, Zi, Hu, Wen, Guo, Tingting, Lei, Zhichao, Li, Ting, Li, Zheng, Liang, Ming, Liao, Cong, Liu, Bingchang, Liu, Jiachen, Liu, Zhiwei, Lu, Shaojun, Shen, Min, Wang, Guangpei, Wang, Huan, Wang, Zhi, Xu, Zhaogui, Yang, Jiawei, Ye, Qing, Zhang, Gehao, Zhang, Yu, Zhao, Zelin, Zheng, Xunjin, Zhou, Hailian, Zhu, Lifu, Zhu, Xianying

arXiv.org Artificial IntelligenceJan-10-2024

Code Large Language Models (Code LLMs) have gained significant attention in the industry due to their wide applications in the full lifecycle of software engineering. However, the effectiveness of existing models in understanding non-English inputs for multi-lingual code-related tasks is still far from well studied. This paper introduces CodeFuse-13B, an open-sourced pre-trained code LLM. It is specifically designed for code-related tasks with both English and Chinese prompts and supports over 40 programming languages. CodeFuse achieves its effectiveness by utilizing a high quality pre-training dataset that is carefully filtered by program analyzers and optimized during the training process. Extensive experiments are conducted using real-world usage scenarios, the industry-standard benchmark HumanEval-x, and the specially designed CodeFuseEval for Chinese prompts. To assess the effectiveness of CodeFuse, we actively collected valuable human feedback from the AntGroup's software development process where CodeFuse has been successfully deployed. The results demonstrate that CodeFuse-13B achieves a HumanEval pass@1 score of 37.10%, positioning it as one of the top multi-lingual code LLMs with similar parameter sizes. In practical scenarios, such as code generation, code translation, code comments, and testcase generation, CodeFuse performs better than other models when confronted with Chinese prompts.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3639477.3639719

2310.06266

Country: North America > United States > Pennsylvania (0.14)

Genre: Research Report (0.70)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MFTCoder: Boosting Code LLMs with Multitask Fine-Tuning

Liu, Bingchang, Chen, Chaoyu, Liao, Cong, Gong, Zi, Wang, Huan, Lei, Zhichao, Liang, Ming, Chen, Dajun, Shen, Min, Zhou, Hailian, Yu, Hang, Li, Jianguo

arXiv.org Artificial IntelligenceNov-3-2023

Code LLMs have emerged as a specialized research field, with remarkable studies dedicated to enhancing model's coding capabilities through fine-tuning on pre-trained models. Previous fine-tuning approaches were typically tailored to specific downstream tasks or scenarios, which meant separate fine-tuning for each task, requiring extensive training resources and posing challenges in terms of deployment and maintenance. Furthermore, these approaches failed to leverage the inherent interconnectedness among different code-related tasks. To overcome these limitations, we present a multi-task fine-tuning framework, MFTcoder, that enables simultaneous and parallel fine-tuning on multiple tasks. By incorporating various loss functions, we effectively address common challenges in multi-task learning, such as data imbalance, varying difficulty levels, and inconsistent convergence speeds. Extensive experiments have conclusively demonstrated that our multi-task fine-tuning approach outperforms both individual fine-tuning on single tasks and fine-tuning on a mixed ensemble of tasks. Moreover, MFTcoder offers efficient training capabilities, including efficient data tokenization modes and PEFT fine-tuning, resulting in significantly improved speed compared to traditional fine-tuning methods. MFTcoder seamlessly integrates with several mainstream open-source LLMs, such as CodeLLama and Qwen. Leveraging the CodeLLama foundation, our MFTcoder fine-tuned model, \textsc{CodeFuse-CodeLLama-34B}, achieves an impressive pass@1 score of 74.4\% on the HumaneEval benchmark, surpassing GPT-4 performance (67\%, zero-shot). MFTCoder is open-sourced at \url{https://github.com/codefuse-ai/MFTCOder}

large language model, machine learning, mftcoder, (18 more...)

arXiv.org Artificial Intelligence

2311.02303

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback