Wei, Hui
Facilitating Long Context Understanding via Supervised Chain-of-Thought Reasoning
Lin, Jingyang, Wong, Andy, Xia, Tian, He, Shenghua, Wei, Hui, Han, Mei, Luo, Jiebo
Recent advances in Large Language Models (LLMs) have enabled them to process increasingly longer sequences, ranging from 2K to 2M tokens and even beyond. However, simply extending the input sequence length does not necessarily lead to effective long-context understanding. In this study, we integrate Chain-of-Thought (CoT) reasoning into LLMs in a supervised manner to facilitate effective long-context understanding. To achieve this, we introduce LongFinanceQA, a synthetic dataset in the financial domain designed to improve long-context reasoning. Unlike existing long-context synthetic data, LongFinanceQA includes intermediate CoT reasoning before the final conclusion, which encourages LLMs to perform explicit reasoning, improving accuracy and interpretability in long-context understanding. To generate synthetic CoT reasoning, we propose Property-driven Agentic Inference (PAI), an agentic framework that simulates human-like reasoning steps, including property extraction, retrieval, and summarization. We evaluate PAI's reasoning capabilities by assessing GPT-4o-mini w/ PAI on the Loong benchmark, outperforming standard GPT-4o-mini by 20.0%. Furthermore, we fine-tune LLaMA-3.1-8B-Instruct on LongFinanceQA, achieving a 24.6% gain on Loong's financial subset.
PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Wei, Hui, Zhang, Zihao, He, Shenghua, Xia, Tian, Pan, Shijia, Liu, Fei
LLMs have immense potential for generating plans, transforming an initial world state into a desired goal state. A large body of research has explored the use of LLMs for various planning tasks, from web navigation to travel planning and database querying. However, many of these systems are tailored to specific problems, making it challenging to compare them or determine the best approach for new tasks. There is also a lack of clear and consistent evaluation criteria. Our survey aims to offer a comprehensive overview of current LLM planners to fill this gap. It builds on foundational work by Kartam and Wilkins (1990) and examines six key performance criteria: completeness, executability, optimality, representation, generalization, and efficiency. For each, we provide a thorough analysis of representative works and highlight their strengths and weaknesses. Our paper also identifies crucial future directions, making it a valuable resource for both practitioners and newcomers interested in leveraging LLM planning to support agentic workflows.
Integrating One-Shot View Planning with a Single Next-Best View via Long-Tail Multiview Sampling
Pan, Sicong, Hu, Hao, Wei, Hui, Dengler, Nils, Zaenker, Tobias, Dawood, Murad, Bennewitz, Maren
Existing view planning systems either adopt an iterative paradigm using next-best views (NBV) or a one-shot pipeline relying on the set-covering view-planning (SCVP) network. However, neither of these methods can concurrently guarantee both high-quality and high-efficiency reconstruction of 3D unknown objects. To tackle this challenge, we introduce a crucial hypothesis: with the availability of more information about the unknown object, the prediction quality of the SCVP network improves. There are two ways to provide extra information: (1) leveraging perception data obtained from NBVs, and (2) training on an expanded dataset of multiview inputs. In this work, we introduce a novel combined pipeline that incorporates a single NBV before activating the proposed multiview-activated (MA-)SCVP network. The MA-SCVP is trained on a multiview dataset generated by our long-tail sampling method, which addresses the issue of unbalanced multiview inputs and enhances the network performance. Extensive simulated experiments substantiate that our system demonstrates a significant surface coverage increase and a substantial 45% reduction in movement cost compared to state-of-the-art systems. Real-world experiments justify the capability of our system for generalization and deployment.
Retrieval-Based Reconstruction For Time-series Contrastive Learning
Xu, Maxwell A., Moreno, Alexander, Wei, Hui, Marlin, Benjamin M., Rehg, James M.
The success of self-supervised contrastive learning hinges on identifying positive data pairs that, when pushed together in embedding space, encode useful information for subsequent downstream tasks. However, in time-series, this is challenging because creating positive pairs via augmentations may break the original semantic meaning. We hypothesize that if we can retrieve information from one subsequence to successfully reconstruct another subsequence, then they should form a positive pair. Harnessing this intuition, we introduce our novel approach: REtrieval-BAsed Reconstruction (REBAR) contrastive learning. First, we utilize a convolutional cross-attention architecture to calculate the REBAR error between two different time-series. Then, through validation experiments, we show that the REBAR error is a predictor of mutual class membership, justifying its usage as a positive/negative labeler. Finally, once integrated into a contrastive learning framework, our REBAR method can learn an embedding that achieves state-ofthe-art performance on downstream tasks across various modalities. Self-supervised learning uses the underlying structure within a dataset to learn rich and generalizable representations without labels, enabling fine-tuning on various downstream tasks. This reduces the need for large labeled datasets, which makes it an attractive approach for the time-series domain. With the advancement of sensor technologies, it is increasingly feasible to capture a large volume of data, but the cost of data labeling remains high. For example, in mobile health, acquiring labels requires burdensome real-time annotation (Rehg et al., 2017). Additionally, in medical applications such as ECG analysis, annotation is costly as it requires specialized medical expertise. Contrastive learning is a powerful self-supervised learning technique, which involves constructing and contrasting positive and negative pairs to yield an embedding space that captures semantic relationships.
Autonomous and Ubiquitous In-node Learning Algorithms of Active Directed Graphs and Its Storage Behavior
Wei, Hui, Miao, Weihua, Li, Fushun
Memory is an important cognitive function for humans. How a brain with such a small power can complete such a complex memory function, the working mechanism behind this is undoubtedly fascinating. Engram theory views memory as the co-activation of specific neuronal clusters. From the perspective of graph theory, nodes represent neurons, and directed edges represent synapses. Then the memory engram is the connected subgraph formed between the activated nodes. In this paper, we use subgraphs as physical carriers of information and propose a parallel distributed information storage algorithm based on node scale in active-directed graphs. An active-directed graph is defined as a graph in which each node has autonomous and independent behavior and relies only on information obtained within the local field of view to make decisions. Unlike static directed graphs used for recording facts, active-directed graphs are decentralized like biological neuron networks and do not have a super manager who has a global view and can control the behavior of each node. Distinct from traditional algorithms with a global field of view, this algorithm is characterized by nodes collaborating globally on resource usage through their limited local field of view. While this strategy may not achieve global optimality as well as algorithms with a global field of view, it offers better robustness, concurrency, decentralization, and bioviability. Finally, it was tested in network capacity, fault tolerance, and robustness. It was found that the algorithm exhibits a larger network capacity in a more sparse network structure because the subgraph generated by a single sample is not a whole but consists of multiple weakly connected components. In this case, the network capacity can be understood as the number of permutations of several weakly connected components in the network.
A Neural Dynamic Model based on Activation Diffusion and a Micro-Explanation for Cognitive Operations
Wei, Hui
The neural mechanism of memory has a very close relation with the problem of representation in artificial intelligence. In this paper a computational model was proposed to simulate the network of neurons in brain and how they process information. The model refers to morphological and electrophysiological characteristics of neural information processing, and is based on the assumption that neurons encode their firing sequence. The network structure, functions for neural encoding at different stages, the representation of stimuli in memory, and an algorithm to form a memory were presented. It also analyzed the stability and recall rate for learning and the capacity of memory. Because neural dynamic processes, one succeeding another, achieve a neuron-level and coherent form by which information is represented and processed, it may facilitate examination of various branches of Artificial Intelligence, such as inference, problem solving, pattern recognition, natural language processing and learning. The processes of cognitive manipulation occurring in intelligent behavior have a consistent representation while all being modeled from the perspective of computational neuroscience. Thus, the dynamics of neurons make it possible to explain the inner mechanisms of different intelligent behaviors by a unified model of cognitive architecture at a micro-level.
The Evolution of Concept-Acquisition based on Developmental Psychology
Wei, Hui
A conceptual system with rich connotation is key to improving the performance of knowledge-based artificial intelligence systems. While a conceptual system, which has abundant concepts and rich semantic relationships, and is developable, evolvable, and adaptable to multi-task environments, its actual construction is not only one of the major challenges of knowledge engineering, but also the fundamental goal of research on knowledge and conceptualization. Finding a new method to represent concepts and construct a conceptual system will therefore greatly improve the performance of many intelligent systems. Fortunately the core of human cognition is a system with relatively complete concepts and a mechanism that ensures the establishment and development of the system. The human conceptual system can not be achieved immediately, but rather must develop gradually. Developmental psychology carefully observes the process of concept acquisition in humans at the behavioral level, and along with cognitive psychology has proposed some rough explanations of those observations. However, due to the lack of research in aspects such as representation, systematic models, algorithm details and realization, many of the results of developmental psychology have not been applied directly to the building of artificial conceptual systems. For example, Karmiloff-Smith's Representation Redescription (RR) supposition reflects a concept-acquisition process that re-describes a lower level representation of a concept to a higher one. This paper is inspired by this developmental psychology viewpoint. We use an object-oriented approach to re-explain and materialize RR supposition from the formal semantic perspective, because the OO paradigm is a natural way to describe the outside world, and it also has strict grammar regulations.