Du, Jiawei
LLM Knows Geometry Better than Algebra: Numerical Understanding of LLM-Based Agents in A Trading Arena
Ma, Tianmi, Du, Jiawei, Huang, Wenxin, Wang, Wenjie, Xie, Liang, Zhong, Xian, Zhou, Joey Tianyi
Recent advancements in large language models (LLMs) have significantly improved performance in natural language processing tasks. However, their ability to generalize to dynamic, unseen tasks, particularly in numerical reasoning, remains a challenge. Existing benchmarks mainly evaluate LLMs on problems with predefined optimal solutions, which may not align with real-world scenarios where clear answers are absent. To bridge this gap, we design the Agent Trading Arena, a virtual numerical game simulating complex economic systems through zero-sum games, where agents invest in stock portfolios. Our experiments reveal that LLMs, including GPT-4o, struggle with algebraic reasoning when dealing with plain-text stock data, often focusing on local details rather than global trends. In contrast, LLMs perform significantly better with geometric reasoning when presented with visual data, such as scatter plots or K-line charts, suggesting that visual representations enhance numerical reasoning. This capability is further improved by incorporating the reflection module, which aids in the analysis and interpretation of complex data. We validate our findings on NASDAQ Stock dataset, where LLMs demonstrate stronger reasoning with visual data compared to text. Our code and data are publicly available at https://github.com/wekjsdvnm/Agent-Trading-Arena.git.
Diversity-Driven Synthesis: Enhancing Dataset Distillation through Directed Weight Adjustment
Du, Jiawei, Zhang, Xin, Hu, Juncheng, Huang, Wenxin, Zhou, Joey Tianyi
The sharp increase in data-related expenses has motivated research into condensing datasets while retaining the most informative features. Dataset distillation has thus recently come to the fore. This paradigm generates synthetic datasets that are representative enough to replace the original dataset in training a neural network. To avoid redundancy in these synthetic datasets, it is crucial that each element contains unique features and remains diverse from others during the synthesis stage. In this paper, we provide a thorough theoretical and empirical analysis of diversity within synthesized datasets. We argue that enhancing diversity can improve the parallelizable yet isolated synthesizing approach. Specifically, we introduce a novel method that employs dynamic and directed weight adjustment techniques to modulate the synthesis process, thereby maximizing the representativeness and diversity of each synthetic instance. Our method ensures that each batch of synthetic data mirrors the characteristics of a large, varying subset of the original dataset. Extensive experiments across multiple datasets, including CI-FAR, Tiny-ImageNet, and ImageNet-1K, demonstrate the superior performance of our method, highlighting its effectiveness in producing diverse and representative synthetic datasets with minimal computational expense.
ReLEP: A Novel Framework for Real-world Long-horizon Embodied Planning
Liu, Siyuan, Du, Jiawei, Xiang, Sicheng, Wang, Zibo, Luo, Dingsheng
Real-world long-horizon embodied planning underpins embodied AI. To accomplish long-horizon tasks, agents need to decompose abstract instructions into detailed steps. Prior works mostly rely on GPT-4V for task decomposition into predefined actions, which limits task diversity due to GPT-4V's finite understanding of larger skillsets. Therefore, we present ReLEP, a groundbreaking framework for Real world Long-horizon Embodied Planning, which can accomplish a wide range of daily tasks. At its core lies a fine-tuned large vision language model that formulates plans as sequences of skill functions according to input instruction and scene image. These functions are selected from a carefully designed skill library. ReLEP is also equipped with a Memory module for plan and status recall, and a Robot Configuration module for versatility across robot types. In addition, we propose a semi-automatic data generation pipeline to tackle dataset scarcity. Real-world off-line experiments across eight daily embodied tasks demonstrate that ReLEP is able to accomplish long-horizon embodied tasks and outperforms other state-of-the-art baseline methods.
Spanning Training Progress: Temporal Dual-Depth Scoring (TDDS) for Enhanced Dataset Pruning
Zhang, Xin, Du, Jiawei, Li, Yunsong, Xie, Weiying, Zhou, Joey Tianyi
Dataset pruning aims to construct a coreset capable of achieving performance comparable to the original, full dataset. Most existing dataset pruning methods rely on snapshot-based criteria to identify representative samples, often resulting in poor generalization across various pruning and cross-architecture scenarios. Recent studies have addressed this issue by expanding the scope of training dynamics considered, including factors such as forgetting event and probability change, typically using an averaging approach. However, these works struggle to integrate a broader range of training dynamics without overlooking well-generalized samples, which may not be sufficiently highlighted in an averaging manner. In this study, we propose a novel dataset pruning method termed as Temporal Dual-Depth Scoring (TDDS), to tackle this problem. TDDS utilizes a dual-depth strategy to achieve a balance between incorporating extensive training dynamics and identifying representative samples for dataset pruning. In the first depth, we estimate the series of each sample's individual contributions spanning the training progress, ensuring comprehensive integration of training dynamics. In the second depth, we focus on the variability of the sample-wise contributions identified in the first depth to highlight well-generalized samples. Extensive experiments conducted on CIFAR and ImageNet datasets verify the superiority of TDDS over previous SOTA methods. Specifically on CIFAR-100, our method achieves 54.51% accuracy with only 10% training data, surpassing random selection by 7.83% and other comparison methods by at least 12.69%.
Sequential Subset Matching for Dataset Distillation
Du, Jiawei, Shi, Qin, Zhou, Joey Tianyi
Dataset distillation is a newly emerging task that synthesizes a small-size dataset used in training deep neural networks (DNNs) for reducing data storage and model training costs. The synthetic datasets are expected to capture the essence of the knowledge contained in real-world datasets such that the former yields a similar performance as the latter. Recent advancements in distillation methods have produced notable improvements in generating synthetic datasets. However, current state-of-the-art methods treat the entire synthetic dataset as a unified entity and optimize each synthetic instance equally. This static optimization approach may lead to performance degradation in dataset distillation. Specifically, we argue that static optimization can give rise to a coupling issue within the synthetic data, particularly when a larger amount of synthetic data is being optimized. This coupling issue, in turn, leads to the failure of the distilled dataset to extract the high-level features learned by the deep neural network (DNN) in the latter epochs. In this study, we propose a new dataset distillation strategy called Sequential Subset Matching (SeqMatch), which tackles this problem by adaptively optimizing the synthetic data to encourage sequential acquisition of knowledge during dataset distillation. Our analysis indicates that SeqMatch effectively addresses the coupling issue by sequentially generating the synthetic instances, thereby enhancing its performance significantly. Our proposed SeqMatch outperforms state-of-the-art methods in various datasets, including SVNH, CIFAR-10, CIFAR-100, and Tiny ImageNet. Our code is available at https://github.com/shqii1j/seqmatch.
Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation
Du, Jiawei, Jiang, Yidi, Tan, Vincent Y. F., Zhou, Joey Tianyi, Li, Haizhou
Model-based deep learning has achieved astounding successes due in part to the availability of large-scale real-world data. However, processing such massive amounts of data comes at a considerable cost in terms of computations, storage, training and the search for good neural architectures. Dataset distillation has thus recently come to the fore. This paradigm involves distilling information from large real-world datasets into tiny and compact synthetic datasets such that processing the latter ideally yields similar performances as the former. State-of-the-art methods primarily rely on learning the synthetic dataset by matching the gradients obtained during training between the real and synthetic data. However, these gradient-matching methods suffer from the so-called accumulated trajectory error caused by the discrepancy between the distillation and subsequent evaluation. To mitigate the adverse impact of this accumulated trajectory error, we propose a novel approach that encourages the optimization algorithm to seek a flat trajectory. We show that the weights trained on synthetic data are robust against the accumulated errors perturbations with the regularization towards the flat trajectory. Our method, called Flat Trajectory Distillation (FTD), is shown to boost the performance of gradient-matching methods by up to 4.7% on a subset of images of the ImageNet dataset with higher resolution images. We also validate the effectiveness and generalizability of our method with datasets of different resolutions and demonstrate its applicability to neural architecture search. Code is available at https://github.com/AngusDujw/FTD-distillation.
Sharpness-Aware Training for Free
Du, Jiawei, Zhou, Daquan, Feng, Jiashi, Tan, Vincent Y. F., Zhou, Joey Tianyi
Modern deep neural networks (DNNs) have achieved state-of-the-art performances but are typically over-parameterized. The over-parameterization may result in undesirably large generalization error in the absence of other customized training strategies. Recently, a line of research under the name of Sharpness-Aware Minimization (SAM) has shown that minimizing a sharpness measure, which reflects the geometry of the loss landscape, can significantly reduce the generalization error. However, SAM-like methods incur a two-fold computational overhead of the given base optimizer (e.g. SGD) for approximating the sharpness measure. In this paper, we propose Sharpness-Aware Training for Free, or SAF, which mitigates the sharp landscape at almost zero additional computational cost over the base optimizer. Intuitively, SAF achieves this by avoiding sudden drops in the loss in the sharp local minima throughout the trajectory of the updates of the weights. Specifically, we suggest a novel trajectory loss, based on the KL-divergence between the outputs of DNNs with the current weights and past weights, as a replacement of the SAM's sharpness measure. This loss captures the rate of change of the training loss along the model's update trajectory. By minimizing it, SAF ensures the convergence to a flat minimum with improved generalization capabilities. Extensive empirical results show that SAF minimizes the sharpness in the same way that SAM does, yielding better results on the ImageNet dataset with essentially the same computational cost as the base optimizer.
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Du, Jiawei, Yan, Hanshu, Feng, Jiashi, Zhou, Joey Tianyi, Zhen, Liangli, Goh, Rick Siow Mong, Tan, Vincent Y. F.
Overparametrized Deep Neural Networks (DNNs) often achieve astounding performances, but may potentially result in severe generalization error. Recently, the relation between the sharpness of the loss landscape and the generalization error has been established by Foret et al. (2020), in which the Sharpness Aware Minimizer (SAM) was proposed to mitigate the degradation of the generalization. Unfortunately, SAM s computational cost is roughly double that of base optimizers, such as Stochastic Gradient Descent (SGD). This paper thus proposes Efficient Sharpness Aware Minimizer (ESAM), which boosts SAM s efficiency at no cost to its generalization performance. ESAM includes two novel and efficient training strategies-StochasticWeight Perturbation and Sharpness-Sensitive Data Selection. In the former, the sharpness measure is approximated by perturbing a stochastically chosen set of weights in each iteration; in the latter, the SAM loss is optimized using only a judiciously selected subset of data that is sensitive to the sharpness. We provide theoretical explanations as to why these strategies perform well. We also show, via extensive experiments on the CIFAR and ImageNet datasets, that ESAM enhances the efficiency over SAM from requiring 100% extra computations to 40% vis-a-vis base optimizers, while test accuracies are preserved or even improved.
On Robustness of Neural Ordinary Differential Equations
Yan, Hanshu, Du, Jiawei, Tan, Vincent Y. F., Feng, Jiashi
A BSTRACT Neural ordinary differential equations (ODEs) have been attracting increasing attention in various research domains recently. There have been some works studying optimization issues and approximation capabilities of neural ODEs, but their robustness is still yet unclear. In this work, we fill this important gap by exploring robustness properties of neural ODEs both empirically and theoretically. We first present an empirical study on the robustness of the neural ODEbased networks (ODENets) by exposing them to inputs with various types of perturbations and subsequently investigating the changes of the corresponding outputs. In contrast to conventional convolutional neural networks (CNNs), we find that the ODENets are more robust against both random Gaussian perturbations and adversarial attack examples. We then provide an insightful understanding of this phenomenon by exploiting a certain desirable property of the flow of a continuous-time ODE, namely that integral curves are nonintersecting. Our work suggests that, due to their intrinsic robustness, it is promising to use neural ODEs as a basic block for building robust deep network models. To further enhance the robustness of vanilla neural ODEs, we propose the time-invariant steady neural ODE (TisODE), which regularizes the flow on perturbed data via the time-invariant property and the imposition of a steady-state constraint. We show that the TisODE method outperforms vanilla neural ODEs and also can work in conjunction with other state-of-the-art architectural methods to build more robust deep networks. Due to their desirable properties, such as invertibility and parameter efficiency, neural ODEs have attracted increasing attention recently (Dupont et al., 2019; Liu et al., 2019).
SC2Net: Sparse LSTMs for Sparse Coding
Zhou, Joey Tianyi (Institute of High Performance Computing, A*STAR) | Di, Kai (Institute of High Performance Computing, A*STAR) | Du, Jiawei (Institute of High Performance Computing, A*STAR) | Peng, Xi (College of Computer Science, Sichuan University) | Yang, Hao (Amazon, Seattle) | Pan, Sinno Jialin (Nanyang Technological University) | Tsang, Ivor W. (University of Technology Sydney) | Liu, Yong (Institute of High Performance Computing, A*STAR) | Qin, Zheng (Institute of High Performance Computing, A*STAR) | Goh, Rick Siow Mong (Institute of High Performance Computing, A*STAR)
The iterative hard-thresholding algorithm (ISTA) is one of the most popular optimization solvers to achieve sparse codes. However, ISTA suffers from following problems: 1) ISTA employs non-adaptive updating strategy to learn the parameters on each dimension with a fixed learning rate. Such a strategy may lead to inferior performance due to the scarcity of diversity; 2) ISTA does not incorporate the historical information into the updating rules, and the historical information has been proven helpful to speed up the convergence. To address these challenging issues, we propose a novel formulation of ISTA (named as adaptive ISTA) by introducing a novel \textit{adaptive momentum vector}. To efficiently solve the proposed adaptive ISTA, we recast it as a recurrent neural network unit and show its connection with the well-known long short term memory (LSTM) model. With a new proposed unit, we present a neural network (termed SC2Net) to achieve sparse codes in an end-to-end manner. To the best of our knowledge, this is one of the first works to bridge the $\ell_1$-solver and LSTM, and may provide novel insights in understanding model-based optimization and LSTM. Extensive experiments show the effectiveness of our method on both unsupervised and supervised tasks.