pmatrix
Bohdi: Heterogeneous LLM Fusion with Automatic Data Exploration
Gao, Junqi, Guo, Zhichang, Zhang, Dazhi, Li, Dong, Liu, Runze, Li, Pengfei, Tian, Kai, Qi, Biqing
Heterogeneous Large Language Model (LLM) fusion integrates the strengths of multiple source LLMs with different architectures into a target LLM with low computational overhead. While promising, existing methods suffer from two major limitations: 1) reliance on real data from limited domain for knowledge fusion, preventing the target LLM from fully acquiring knowledge across diverse domains, and 2) fixed data allocation proportions across domains, failing to dynamically adjust according to the target LLM's varying capabilities across domains, leading to a capability imbalance. To overcome these limitations, we propose Bohdi, a synthetic-data-only heterogeneous LLM fusion framework. Through the organization of knowledge domains into a hierarchical tree structure, Bohdi enables automatic domain exploration and multi-domain data generation through multi-model collaboration, thereby comprehensively extracting knowledge from source LLMs. By formalizing domain expansion and data sampling proportion allocation on the knowledge tree as a Hierarchical Multi-Armed Bandit problem, Bohdi leverages the designed DynaBranches mechanism to adaptively adjust sampling proportions based on the target LLM's performance feedback across domains. Integrated with our proposed Introspection-Rebirth (IR) mechanism, DynaBranches dynamically tracks capability shifts during target LLM's updates via Sliding Window Binomial Likelihood Ratio Testing (SWBLRT), further enhancing its online adaptation capability. Comparative experimental results on a comprehensive suite of benchmarks demonstrate that Bohdi significantly outperforms existing baselines on multiple target LLMs, exhibits higher data efficiency, and virtually eliminates the imbalance in the target LLM's capabilities. Our code is available at https://github.com/gjq100/Bohdi.git.
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- Asia > Middle East > Jordan (0.04)
- (2 more...)
INC-Math: Integrating Natural Language and Code for Enhanced Mathematical Reasoning in Large Language Models
Xiong, Xuyuan, Han, Simeng, Zhou, Ziyue, Cohan, Arman
Large Language Models (LLMs) are commonly used to generate solutions for mathematical reasoning problems in the following formats: natural language, code, or a combination of both. In this paper, we explore fundamental questions related to solving mathematical reasoning problems using natural language and code with state-of-the-art LLMs, including GPT-4o-mini and LLama-3.1-8b-Turbo. Our findings show that LLMs are better at reasoning in natural language compared to code. Additionally, although natural language and code serve as complementary forms of reasoning, they can affect each other in a negative way in certain scenarios. These insights motivate our development of a new prompting method, INC-Math, which leverages an LLM to dynamically select the most appropriate reasoning form, resulting in improved performance over comparable baselines with GPT-4o-mini.
Key-Point-Driven Data Synthesis with its Enhancement on Mathematical Reasoning
Huang, Yiming, Liu, Xiao, Gong, Yeyun, Gou, Zhibin, Shen, Yelong, Duan, Nan, Chen, Weizhu
Large language models (LLMs) have shown great potential in complex reasoning tasks, yet their performance is often hampered by the scarcity of high-quality and reasoning-focused training datasets. Addressing this challenge, we propose Key-Point-Driven Data Synthesis (KPDDS), a novel data synthesis framework that synthesizes question-answer pairs by leveraging key points and exemplar practices from authentic data sources. KPDDS ensures the generation of novel questions with rigorous quality control and substantial scalability. As a result, we present KPMath, an extensive synthetic dataset tailored for mathematical reasoning, comprising over 800K question-answer pairs. Utilizing KPMath and augmenting it with additional reasoning-intensive corpora, we create the comprehensive KPMath-Plus dataset. The Qwen1.5-72B model, fine-tuned on KPMath-Plus, achieves 87.0%
- Education > Curriculum > Subject-Specific Education (0.48)
- Education > Educational Setting (0.46)
Deep Learning From Scratch I: Computational Graphs - deep ideas
This is part 1 of a series of tutorials, in which we develop the mathematical and algorithmic underpinnings of deep neural networks from scratch and implement our own neural network library in Python, mimicing the TensorFlow API. I do not assume that you have any preknowledge about machine learning or neural networks. However, you should have some preknowledge of calculus, linear algebra, fundamental algorithms and probability theory on an undergraduate level. If you get stuck at some point, please leave a comment. By the end of this text, you will have a deep understanding of the math behind neural networks and how deep learning libraries work under the hood.
Neural Networks Tutorial - A Pathway to Deep Learning - Adventures in Machine Learning
Chances are, if you are searching for a tutorial on artificial neural networks (ANN) you already have some idea of what they are, and what they are capable of doing. But did you know that neural networks are the foundation of the new and exciting field of deep learning? Deep learning is the field of machine learning that is making many state-of-the-art advancements, from beating players at Go and Poker, to speeding up drug discovery and assisting self-driving cars. If these types of cutting edge applications excite you like they excite me, then you will be interesting in learning as much as you can about deep learning. However, that requires you to know quite a bit about how neural networks work. This tutorial article is designed to help you get up to speed in neural networks as quickly as possible. In this tutorial I'll be presenting some concepts, code and maths that will enable you to build and understand a simple neural network. Some tutorials focus only on the code and skip the maths – but this impedes understanding. I'll take things as slowly as possible, but it might help to brush up on your matrices and differentiation if you need to. The code will be in Python, so it will be beneficial if you have a basic understanding of how Python works. You'll pretty much get away with knowing about Python functions, loops and the basics of the numpy library. By the end of this neural networks tutorial you'll be able to build an ANN in Python that will correctly classify handwritten digits in images with a fair degree of accuracy. Once you're done with this tutorial, you can dive a little deeper with the following posts: All of the relevant code in this tutorial can be found here. Here's an outline of the tutorial, with links, so you can easily navigate to the parts you want: Artificial neural networks (ANNs) are software implementations of the neuronal structure of our brains. We don't need to talk about the complex biology of our brain structures, but suffice to say, the brain contains neurons which are kind of like organic switches. These can change their output state depending on the strength of their electrical or chemical input. The neural network in a person's brain is a hugely interconnected network of neurons, where the output of any given neuron may be the input to thousands of other neurons.