Huang, Zhijie
KALE-LM: Unleash The Power Of AI For Science Via Knowledge And Logic Enhanced Large Model
Dai, Weichen, Chen, Yezeng, Dai, Zijie, Huang, Zhijie, Liu, Yubo, Pan, Yixuan, Song, Baiyang, Zhong, Chengli, Li, Xinhe, Wang, Zeyu, Feng, Zhuoying, Zhou, Yi
In recent years, the rapid development of artificial intelligence (AI) technology has enabled it to achieve, and in some cases surpass, top human performance in various high-intelligence tasks. These include recognition in speech [1], facial [2], and image [3], games such as Go [4], StarCraft [5], and Dota2 [6], as well as tasks related to text [7], image [8], and video generation, machine translation [9], knowledge-based question answering [10], debates, and solving advanced mathematical problems [11]. Science is one of the most important fields for the application of AI. As the crown jewel of human civilization and the cornerstone of various industries, science is a core driver of human progress, and its development can significantly accelerate and even revolutionize many fields. Historically, there have been three major research paradigms in science: the first paradigm, experiment, which emerged from Newtonian empiricism; the second paradigm, theory, born from Einstein's rationalism; and the third paradigm, simulation/computation, which arose from the third industrial revolution, the computation and information revolution.
An Empirical Study of Data Ability Boundary in LLMs' Math Reasoning
Chen, Zui, Chen, Yezeng, Han, Jiaqi, Huang, Zhijie, Qi, Ji, Zhou, Yi
Large language models (LLMs) are displaying emergent abilities for math reasoning tasks,and there is a growing attention on enhancing the ability of open-source LLMs through supervised fine-tuning (SFT).In this paper, we aim to explore a general data strategy for supervised data to help optimize and expand math reasoning ability.Firstly, we determine the ability boundary of reasoning paths augmentation by identifying these paths' minimal optimal set.Secondly, we validate that different abilities of the model can be cumulatively enhanced by Mix of Minimal Optimal Sets of corresponding types of data, while our models MMOS achieve SOTA performance on series base models under much lower construction costs.Besides, we point out GSM-HARD is not really hard and today's LLMs no longer lack numerical robustness.Also, we provide an Auto Problem Generator for robustness testing and educational applications.Our code and data are publicly available at https://github.com/cyzhh/MMOS.
MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation
Wu, Xinda, Huang, Zhijie, Zhang, Kejun, Yu, Jiaxing, Tan, Xu, Zhang, Tieyao, Wang, Zihao, Sun, Lingyun
Pre-trained language models have achieved impressive results in various music understanding and generation tasks. However, existing pre-training methods for symbolic melody generation struggle to capture multi-scale, multi-dimensional structural information in note sequences, due to the domain knowledge discrepancy between text and music. Moreover, the lack of available large-scale symbolic melody datasets limits the pre-training improvement. In this paper, we propose MelodyGLM, a multi-task pre-training framework for generating melodies with long-term structure. We design the melodic n-gram and long span sampling strategies to create local and global blank infilling tasks for modeling the local and global structures in melodies. Specifically, we incorporate pitch n-grams, rhythm n-grams, and their combined n-grams into the melodic n-gram blank infilling tasks for modeling the multi-dimensional structures in melodies. To this end, we have constructed a large-scale symbolic melody dataset, MelodyNet, containing more than 0.4 million melody pieces. MelodyNet is utilized for large-scale pre-training and domain-specific n-gram lexicon construction. Both subjective and objective evaluations demonstrate that MelodyGLM surpasses the standard and previous pre-training methods. In particular, subjective evaluations show that, on the melody continuation task, MelodyGLM gains average improvements of 0.82, 0.87, 0.78, and 0.94 in consistency, rhythmicity, structure, and overall quality, respectively. Notably, MelodyGLM nearly matches the quality of human-composed melodies on the melody inpainting task.
Efficient Multi-Scale Attention Module with Cross-Spatial Learning
Ouyang, Daliang, He, Su, Zhang, Guozhong, Luo, Mingzhu, Guo, Huaiyong, Zhan, Jian, Huang, Zhijie
Remarkable effectiveness of the channel or spatial attention mechanisms for producing more discernible feature representation are illustrated in various computer vision tasks. However, modeling the cross-channel relationships with channel dimensionality reduction may bring side effect in extracting deep visual representations. In this paper, a novel efficient multi-scale attention (EMA) module is proposed. Focusing on retaining the information on per channel and decreasing the computational overhead, we reshape the partly channels into the batch dimensions and group the channel dimensions into multiple sub-features which make the spatial semantic features well-distributed inside each feature group. Specifically, apart from encoding the global information to re-calibrate the channel-wise weight in each parallel branch, the output features of the two parallel branches are further aggregated by a cross-dimension interaction for capturing pixel-level pairwise relationship. We conduct extensive ablation studies and experiments on image classification and object detection tasks with popular benchmarks (e.g., CIFAR-100, ImageNet-1k, MS COCO and VisDrone2019) for evaluating its performance.
WuYun: Exploring hierarchical skeleton-guided melody generation using knowledge-enhanced deep learning
Zhang, Kejun, Wu, Xinda, Zhang, Tieyao, Huang, Zhijie, Tan, Xu, Liang, Qihao, Wu, Songruoyao, Sun, Lingyun
Although deep learning has revolutionized music generation, existing methods for structured melody generation follow an end-to-end left-to-right note-by-note generative paradigm and treat each note equally. Here, we present WuYun, a knowledge-enhanced deep learning architecture for improving the structure of generated melodies, which first generates the most structurally important notes to construct a melodic skeleton and subsequently infills it with dynamically decorative notes into a full-fledged melody. Specifically, we use music domain knowledge to extract melodic skeletons and employ sequence learning to reconstruct them, which serve as additional knowledge to provide auxiliary guidance for the melody generation process. We demonstrate that WuYun can generate melodies with better long-term structure and musicality and outperforms other state-of-the-art methods by 0.51 on average on all subjective evaluation metrics. Our study provides a multidisciplinary lens to design melodic hierarchical structures and bridge the gap between data-driven and knowledge-based approaches for numerous music generation tasks.