Goto

Collaborating Authors

 Problem Solving


AeroVerse: UAV-Agent Benchmark Suite for Simulating, Pre-training, Finetuning, and Evaluating Aerospace Embodied World Models

arXiv.org Artificial Intelligence

Aerospace embodied intelligence aims to empower unmanned aerial vehicles (UAVs) and other aerospace platforms to achieve autonomous perception, cognition, and action, as well as egocentric active interaction with humans and the environment. The aerospace embodied world model serves as an effective means to realize the autonomous intelligence of UAVs and represents a necessary pathway toward aerospace embodied intelligence. However, existing embodied world models primarily focus on ground-level intelligent agents in indoor scenarios, while research on UAV intelligent agents remains unexplored. To address this gap, we construct the first large-scale real-world image-text pre-training dataset, AerialAgent-Ego10k, featuring urban drones from a first-person perspective. We also create a virtual image-text-pose alignment dataset, CyberAgent Ego500k, to facilitate the pre-training of the aerospace embodied world model. For the first time, we clearly define 5 downstream tasks, i.e., aerospace embodied scene awareness, spatial reasoning, navigational exploration, task planning, and motion decision, and construct corresponding instruction datasets, i.e., SkyAgent-Scene3k, SkyAgent-Reason3k, SkyAgent-Nav3k and SkyAgent-Plan3k, and SkyAgent-Act3k, for fine-tuning the aerospace embodiment world model. Simultaneously, we develop SkyAgentEval, the downstream task evaluation metrics based on GPT-4, to comprehensively, flexibly, and objectively assess the results, revealing the potential and limitations of 2D/3D visual language models in UAV-agent tasks. Furthermore, we integrate over 10 2D/3D visual-language models, 2 pre-training datasets, 5 finetuning datasets, more than 10 evaluation metrics, and a simulator into the benchmark suite, i.e., AeroVerse, which will be released to the community to promote exploration and development of aerospace embodied intelligence.


SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search

arXiv.org Artificial Intelligence

Designing low-latency and high-efficiency hybrid networks for a variety of low-cost commodity edge devices is both costly and tedious, leading to the adoption of hardware-aware neural architecture search (NAS) for finding optimal architectures. However, unifying NAS for a wide range of edge devices presents challenges due to the variety of hardware designs, supported operations, and compilation optimizations. Existing methods often fix the search space of architecture choices (e.g., activation, convolution, or self-attention) and estimate latency using hardware-agnostic proxies (e.g., FLOPs), which fail to achieve proclaimed latency across various edge devices. To address this issue, we propose SCAN-Edge, a unified NAS framework that jointly searches for self-attention, convolution, and activation to accommodate the wide variety of edge devices, including CPU-, GPU-, and hardware accelerator-based systems. To handle the large search space, SCAN-Edge relies on with a hardware-aware evolutionary algorithm that improves the quality of the search space to accelerate the sampling process. Experiments on large-scale datasets demonstrate that our hybrid networks match the actual MobileNetV2 latency for 224x224 input resolution on various commodity edge devices.


Scalable Multivariate Fronthaul Quantization for Cell-Free Massive MIMO

arXiv.org Artificial Intelligence

The conventional approach to the fronthaul design for cell-free massive MIMO system follows the compress-and-precode (CP) paradigm. Accordingly, encoded bits and precoding coefficients are shared by the distributed unit (DU) on the fronthaul links, and precoding takes place at the radio units (RUs). Previous theoretical work has shown that CP can be potentially improved by a significant margin by precode-and-compress (PC) methods, in which all baseband processing is carried out at the DU, which compresses the precoded signals for transmission on the fronthaul links. The theoretical performance gain of PC methods are particularly pronounced when the DU implements multivariate quantization (MQ), applying joint quantization across the signals for all the RUs. However, existing solutions for MQ are characterized by a computational complexity that grows exponentially with the sum-fronthaul capacity from the DU to all RUs. This work sets out to design scalable MQ strategies for PC-based cell-free massive MIMO systems. For the low-fronthaul capacity regime, we present alpha-parallel MQ (alpha-PMQ), whose complexity is exponential only in the fronthaul capacity towards an individual RU, while performing close to full MQ. alpha-PMQ tailors MQ to the topology of the network by allowing for parallel local quantization steps for RUs that do not interfere too much with each other. For the high-fronthaul capacity regime, we then introduce neural MQ, which replaces the exhaustive search in MQ with gradient-based updates for a neural-network-based decoder, attaining a complexity that grows linearly with the sum-fronthaul capacity. Numerical results demonstrate that the proposed scalable MQ strategies outperform CP for both the low and high-fronthaul capacity regimes at the cost of increased computational complexity at the DU (but not at the RUs).


Count-based Novelty Exploration in Classical Planning

arXiv.org Artificial Intelligence

Count-based exploration methods are widely employed subdivide planning problems into smaller sub-problems through the to improve the exploratory behavior of learning agents over sequential use of partitioning heuristics to control the direction of search and decision problems. Meanwhile, Novelty search has achieved success increase the number of novel nodes. Katz et al. [13] provide a definition in Classical Planning through recording of the first, but not successive, of novelty of a state with respect to its heuristic estimate, providing occurrences of tuples. In order to structure the exploration, multiple novelty measures which quantify the novelty degree of a however, the number of tuples considered needs to grow exponentially state in terms of the number of novel and non-novel state facts. More as the search progresses. We propose a new novelty technique, recently, Singh et al. [27] introduce approximate novelty, which uses classical count-based novelty, which aims to explore the state space an approximate measurement of state novelty which is more time with a constant number of tuples, by leveraging the frequency of each and memory efficient, proving capable of estimating novelty values tuple's appearance in a search tree. We then justify the mechanisms of cardinality greater than 2 in practical scenarios. Relating Novelty through which lower tuple counts lead the search towards novel tuples.


Path-Consistency: Prefix Enhancement for Efficient Inference in LLM

arXiv.org Artificial Intelligence

To enhance the reasoning capabilities of large language models (LLMs), self-consistency has gained significant popularity by combining multiple sampling with majority voting. However, the state-of-the-art self-consistency approaches consume substantial computational resources and lead to significant additional time costs due to the multiple sampling. This prevents its full potential from being realized in scenarios where computational resources are critical. To improve the inference efficiency, this paper introduces \textit{path-consistency}, a method that leverages the confidence of answers generated in earlier branches to identify the prefix of the most promising path. By dynamically guiding the generation of subsequent branches based on this prefix, the \textit{path-consistency} mitigates both the errors and redundancies from random or less useful sampling in self-consistency. As a result, it can significantly accelerate the inference process by reducing the number of tokens generated. Our extensive empirical evaluation shows that the \textit{path-consistency} achieves significant acceleration in inference latency ranging from $7.8\%$ to $40.5\%$, while maintaining or even improving task accuracy across different datasets, including mathematical reasoning, common sense reasoning, symbolic reasoning, and code generation.


Neuro-Symbolic AI for Military Applications

arXiv.org Artificial Intelligence

Artificial Intelligence (AI) plays a significant role in enhancing the capabilities of defense systems, revolutionizing strategic decision-making, and shaping the future landscape of military operations. Neuro-Symbolic AI is an emerging approach that leverages and augments the strengths of neural networks and symbolic reasoning. These systems have the potential to be more impactful and flexible than traditional AI systems, making them well-suited for military applications. This paper comprehensively explores the diverse dimensions and capabilities of Neuro-Symbolic AI, aiming to shed light on its potential applications in military contexts. We investigate its capacity to improve decision-making, automate complex intelligence analysis, and strengthen autonomous systems. We further explore its potential to solve complex tasks in various domains, in addition to its applications in military contexts. Through this exploration, we address ethical, strategic, and technical considerations crucial to the development and deployment of Neuro-Symbolic AI in military and civilian applications. Contributing to the growing body of research, this study represents a comprehensive exploration of the extensive possibilities offered by Neuro-Symbolic AI.


Enhancing Knowledge Tracing with Concept Map and Response Disentanglement

arXiv.org Artificial Intelligence

In the rapidly advancing realm of educational technology, it becomes critical to accurately trace and understand student knowledge states. Conventional Knowledge Tracing (KT) models have mainly focused on binary responses (i.e., correct and incorrect answers) to questions. Unfortunately, they largely overlook the essential information in students' actual answer choices, particularly for Multiple Choice Questions (MCQs), which could help reveal each learner's misconceptions or knowledge gaps. To tackle these challenges, we propose the Concept map-driven Response disentanglement method for enhancing Knowledge Tracing (CRKT) model. CRKT benefits KT by directly leveraging answer choices--beyond merely identifying correct or incorrect answers--to distinguish responses with different incorrect choices. We further introduce the novel use of unchosen responses by employing disentangled representations to get insights from options not selected by students. Additionally, CRKT tracks the student's knowledge state at the concept level and encodes the concept map, representing the relationships between them, to better predict unseen concepts. This approach is expected to provide actionable feedback, improving the learning experience. Our comprehensive experiments across multiple datasets demonstrate CRKT's effectiveness, achieving superior performance in prediction accuracy and interpretability over state-of-the-art models.


Reconciling Different Theories of Learning with an Agent-based Model of Procedural Learning

arXiv.org Artificial Intelligence

Computational models of human learning can play a significant role in enhancing our knowledge about nuances in theoretical and qualitative learning theories and frameworks. There are many existing frameworks in educational settings that have shown to be verified using empirical studies, but at times we find these theories make conflicting claims or recommendations for instruction. In this study, we propose a new computational model of human learning, Procedural ABICAP, that reconciles the ICAP, Knowledge-Learning-Instruction (KLI), and cognitive load theory (CLT) frameworks for learning procedural knowledge. ICAP assumes that constructive learning generally yields better learning outcomes, while theories such as KLI and CLT claim that this is not always true. We suppose that one reason for this may be that ICAP is primarily used for conceptual learning and is underspecified as a framework for thinking about procedural learning. We show how our computational model, both by design and through simulations, can be used to reconcile different results in the literature. More generally, we position our computational model as an executable theory of learning that can be used to simulate various educational settings.


Advancements in Molecular Property Prediction: A Survey of Single and Multimodal Approaches

arXiv.org Artificial Intelligence

Molecular Property Prediction (MPP) plays a pivotal role across diverse domains, spanning drug discovery, material science, and environmental chemistry. Fueled by the exponential growth of chemical data and the evolution of artificial intelligence, recent years have witnessed remarkable strides in MPP. However, the multifaceted nature of molecular data, such as molecular structures, SMILES notation, and molecular images, continues to pose a fundamental challenge in its effective representation. To address this, representation learning techniques are instrumental as they acquire informative and interpretable representations of molecular data. This article explores recent AI/-based approaches in MPP, focusing on both single and multiple modality representation techniques. It provides an overview of various molecule representations and encoding schemes, categorizes MPP methods by their use of modalities, and outlines datasets and tools available for feature generation. The article also analyzes the performance of recent methods and suggests future research directions to advance the field of MPP.


Automating Thought of Search: A Journey Towards Soundness and Completeness

arXiv.org Artificial Intelligence

Planning remains one of the last standing bastions for large language models (LLMs), which now turn their attention to search. Most of the literature uses the language models as world models to define the search space, forgoing soundness for the sake of flexibility. A recent work, Thought of Search (ToS), proposed defining the search space with code, having the language models produce that code. ToS requires a human in the loop, collaboratively producing a sound successor function and goal test. The result, however, is worth the effort: all the tested datasets were solved with 100% accuracy. At the same time LLMs have demonstrated significant progress in code generation and refinement for complex reasoning tasks. In this work, we automate ToS (AutoToS), completely taking the human out of the loop of solving planning problems. AutoToS guides the language model step by step towards the generation of sound and complete search components, through feedback from both generic and domain specific unit tests. We achieve 100% accuracy, with minimal feedback iterations, using LLMs of various sizes on all evaluated domains.