Goto

Collaborating Authors

 Overview


Composite Learning Units: Generalized Learning Beyond Parameter Updates to Transform LLMs into Adaptive Reasoners

arXiv.org Artificial Intelligence

Human learning thrives on the ability to learn from mistakes, adapt through feedback, and refine understanding-processes often missing in static machine learning models. In this work, we introduce Composite Learning Units (CLUs) designed to transform reasoners, such as Large Language Models (LLMs), into learners capable of generalized, continuous learning without conventional parameter updates while enhancing their reasoning abilities through continual interaction and feedback. CLUs are built on an architecture that allows a reasoning model to maintain and evolve a dynamic knowledge repository: a General Knowledge Space for broad, reusable insights and a Prompt-Specific Knowledge Space for task-specific learning. Through goal-driven interactions, CLUs iteratively refine these knowledge spaces, enabling the system to adapt dynamically to complex tasks, extract nuanced insights, and build upon past experiences autonomously. We demonstrate CLUs' effectiveness through a cryptographic reasoning task, where they continuously evolve their understanding through feedback to uncover hidden transformation rules. While conventional models struggle to grasp underlying logic, CLUs excel by engaging in an iterative, goal-oriented process. Specialized components-handling knowledge retrieval, prompt generation, and feedback analysis-work together within a reinforcing feedback loop. This approach allows CLUs to retain the memory of past failures and successes, adapt autonomously, and apply sophisticated reasoning effectively, continually learning from mistakes while also building on breakthroughs.


A Survey: Collaborative Hardware and Software Design in the Era of Large Language Models

arXiv.org Artificial Intelligence

The rapid development of large language models (LLMs) has significantly transformed the field of artificial intelligence, demonstrating remarkable capabilities in natural language processing and moving towards multi-modal functionality. These models are increasingly integrated into diverse applications, impacting both research and industry. However, their development and deployment present substantial challenges, including the need for extensive computational resources, high energy consumption, and complex software optimizations. Unlike traditional deep learning systems, LLMs require unique optimization strategies for training and inference, focusing on system-level efficiency. This paper surveys hardware and software co-design approaches specifically tailored to address the unique characteristics and constraints of large language models. This survey analyzes the challenges and impacts of LLMs on hardware and algorithm research, exploring algorithm optimization, hardware design, and system-level innovations. It aims to provide a comprehensive understanding of the trade-offs and considerations in LLM-centric computing systems, guiding future advancements in AI. Finally, we summarize the existing efforts in this space and outline future directions toward realizing production-grade co-design methodologies for the next generation of large language models and AI systems.


Overcoming Autoware-Ubuntu Incompatibility in Autonomous Driving Systems-Equipped Vehicles: Lessons Learned

arXiv.org Artificial Intelligence

Autonomous vehicles have been rapidly developed as demand that provides safety and efficiency in transportation systems. As autonomous vehicles are designed based on open-source operating and computing systems, there are numerous resources aimed at building an operating platform composed of Ubuntu, Autoware, and Robot Operating System (ROS). However, no explicit guidelines exist to help scholars perform trouble-shooting due to incompatibility between the Autoware platform and Ubuntu operating systems installed in autonomous driving systems-equipped vehicles (i.e., Chrysler Pacifica). The paper presents an overview of integrating the Autoware platform into the autonomous vehicle's interface based on lessons learned from trouble-shooting processes for resolving incompatible issues. The trouble-shooting processes are presented based on resolving the incompatibility and integration issues of Ubuntu 20.04, Autoware.AI, and ROS Noetic software installed in an autonomous driving systems-equipped vehicle. Specifically, the paper focused on common incompatibility issues and code-solving protocols involving Python compatibility, Compute Unified Device Architecture (CUDA) installation, Autoware installation, and simulation in Autoware.AI. The objective of the paper is to provide an explicit and detail-oriented presentation to showcase how to address incompatibility issues among an autonomous vehicle's operating interference. The lessons and experience presented in the paper will be useful for researchers who encountered similar issues and could follow up by performing trouble-shooting activities and implementing ADS-related projects in the Ubuntu, Autoware, and ROS operating systems.


Harnessing the Power of Noise: A Survey of Techniques and Applications

arXiv.org Artificial Intelligence

In Computer science and across various engineering fields, noise is often considered a nuisance and annoyance. It distorts details and makes data less accurate. In the past, the goal has often been to eliminate noise with the goal to make systems more reliable and accurate. But views on noise are changing. New findings suggest that noise can actually enhance and advance technologies in many areas, making us see it not just as a disruption but as a way to improve system performance. Thus, once unwanted and hard to control, noise now appears to be a key player in improving the performance of complex information processing systems [22]. This phenomena is often known as Stochastic Resonance, which helps clear up signals, improve image quality, and strengthen models in machine learning [7, 22, 101]. This duality of noise -- both a problem and a benefit -- highlights the tricky role of noise while optimizing advanced neural networks and machine learning models.


Compositional Risk Minimization

arXiv.org Artificial Intelligence

In this work, we tackle a challenging and extreme form of subpopulation shift, which is termed compositional shift. Under compositional shifts, some combinations of attributes are totally absent from the training distribution but present in the test distribution. We model the data with flexible additive energy distributions, where each energy term represents an attribute, and derive a simple alternative to empirical risk minimization termed compositional risk minimization (CRM). We first train an additive energy classifier to predict the multiple attributes and then adjust this classifier to tackle compositional shifts. We provide an extensive theoretical analysis of CRM, where we show that our proposal extrapolates to special affine hulls of seen attribute combinations. Empirical evaluations on benchmark datasets confirms the improved robustness of CRM compared to other methods from the literature designed to tackle various forms of subpopulation shifts.


Parameter Choice and Neuro-Symbolic Approaches for Deep Domain-Invariant Learning

arXiv.org Artificial Intelligence

As artificial intelligence (AI) systems advance, we move towards broad AI: systems capable of performing well on diverse tasks, understanding context, and adapting rapidly to new scenarios. A central challenge for broad AI systems is to generalize over tasks in related domains and being robust to distribution shifts. Neuro-symbolic (NeSy) AI bridges the gap between symbolic and sub-symbolic paradigms to address these challenges, enabling adaptable, generalizable, and more interpretable systems. The development of broad AI requires advancements in domain adaptation (DA), enabling models trained on source domains to effectively generalize to unseen target domains. Traditional approaches often rely on parameter optimization and fine-tuning, which can be impractical due to high costs and risks of catastrophic forgetting. NeSy AI systems use multiple models and methods to generalize to unseen domains and maintain performance across varying conditions. We analyze common DA and NeSy approaches with a focus on deep domain-invariant learning, extending to real-world challenges such as adapting to continuously changing domains and handling large domain gaps. We showcase state-of-the-art model-selection methods for scenarios with limited samples and introduce domain-specific adaptations without gradient-based updates for cases where model tuning is infeasible. This work establishes a framework for scalable and generalizable broad AI systems applicable across various problem settings, demonstrating how symbolic reasoning and large language models can build universal computational graphs that generalize across domains and problems, contributing to more adaptable AI approaches for real-world applications.


TEOChat: A Large Vision-Language Assistant for Temporal Earth Observation Data

arXiv.org Artificial Intelligence

Large vision and language assistants have enabled new capabilities for interpreting natural images. These approaches have recently been adapted to earth observation data, but they are only able to handle single image inputs, limiting their use for many real-world tasks. In this work, we develop a new vision and language assistant called TEOChat that can engage in conversations about temporal sequences of earth observation data. To train TEOChat, we curate an instructionfollowing dataset composed of many single image and temporal tasks including building change and damage assessment, semantic change detection, and temporal scene classification. We show that TEOChat can perform a wide variety of spatial and temporal reasoning tasks, substantially outperforming previous vision and language assistants, and even achieving comparable or better performance than specialist models trained to perform these specific tasks. Furthermore, TEOChat achieves impressive zero-shot performance on a change detection and change question answering dataset, outperforms GPT-4o and Gemini 1.5 Pro on multiple temporal tasks, and exhibits stronger single image capabilities than a comparable single EO image instruction-following model. Many earth observation (EO) tasks require the ability to reason over time. For example, change detection is a widely studied task where the goal is to identify salient changes in a region using multiple EO images capturing the region at different times (Chughtai et al., 2021; Bai et al., 2023; Cheng et al., 2023). Previous methods to automatically detect change in EO imagery have been specialist models, constraining their use to a single task or small set of tasks that they were explicitly trained to perform (Bai et al., 2023; Cheng et al., 2023). Advancements in the modeling of multimodal data have enabled generalist vision-language models (VLMs) that can perform a variety of natural image interpretation tasks specified flexibly through natural language (Achiam et al., 2023; Team et al., 2023; Liu et al., 2023). However, no prior VLMs can model temporal EO data (left of Figure 1), notably including change detection tasks. We investigate the performance of Video-LLaVA (Lin et al., 2023), a strong natural image pre-trained VLM that can receive images and videos as input, and GeoChat (Kuckreja et al., 2023), a strong VLM fine-tuned on single EO image tasks (right of Figure 1). We find that Video-LLaVA generates inaccurate information, likely because it has primarily been trained on natural images and videos, whereas GeoChat can only input single images and cannot process information across time. TEOChat is the first VLM to model temporal earth observation (EO) data. We compare a temporal VLM (Video-LLaVA (Lin et al., 2023)) and an EO VLM (GeoChat (Kuckreja et al., 2023)) with TEOChat.


Training-free LLM-generated Text Detection by Mining Token Probability Sequences

arXiv.org Artificial Intelligence

Large language models (LLMs) have demonstrated remarkable capabilities in generating high-quality texts across diverse domains. However, the potential misuse of LLMs has raised significant concerns, underscoring the urgent need for reliable detection of LLM-generated texts. Conventional training-based detectors often struggle with generalization, particularly in cross-domain and cross-model scenarios. In contrast, training-free methods, which focus on inherent discrepancies through carefully designed statistical features, offer improved generalization and interpretability. Despite this, existing training-free detection methods typically rely on global text sequence statistics, neglecting the modeling of local discriminative features, thereby limiting their detection efficacy. In this work, we introduce a novel training-free detector, termed \textbf{Lastde} that synergizes local and global statistics for enhanced detection. For the first time, we introduce time series analysis to LLM-generated text detection, capturing the temporal dynamics of token probability sequences. By integrating these local statistics with global ones, our detector reveals significant disparities between human and LLM-generated texts. We also propose an efficient alternative, \textbf{Lastde++} to enable real-time detection. Extensive experiments on six datasets involving cross-domain, cross-model, and cross-lingual detection scenarios, under both white-box and black-box settings, demonstrated that our method consistently achieves state-of-the-art performance. Furthermore, our approach exhibits greater robustness against paraphrasing attacks compared to existing baseline methods.


Provable Methods for Searching with an Imperfect Sensor

arXiv.org Artificial Intelligence

Assume that a target is known to be present at an unknown point among a finite set of locations in the plane. We search for it using a mobile robot that has imperfect sensing capabilities. It takes time for the robot to move between locations and search a location; we have a total time budget within which to conduct the search. We study the problem of computing a search path/strategy for the robot that maximizes the probability of detection of the target. Considering non-uniform travel times between points (e.g., based on the distance between them) is crucial for search and rescue applications; such problems have been investigated to a limited extent due to their inherent complexity. In this paper, we describe fast algorithms with performance guarantees for this search problem and some variants, complement them with complexity results, and perform experiments to observe their performance.


Long-Context LLMs Meet RAG: Overcoming Challenges for Long Inputs in RAG

arXiv.org Artificial Intelligence

Retrieval-augmented generation (RAG) empowers large language models (LLMs) to utilize external knowledge sources. The increasing capacity of LLMs to process longer input sequences opens up avenues for providing more retrieved information, to potentially enhance the quality of generated outputs. It is plausible to assume that a larger retrieval set would contain more relevant information (higher recall), that might result in improved performance. However, our empirical findings demonstrate that for many long-context LLMs, the quality of generated output initially improves first, but then subsequently declines as the number of retrieved passages increases. This paper investigates this phenomenon, identifying the detrimental impact of retrieved "hard negatives" as a key contributor. To mitigate this and enhance the robustness of long-context LLM-based RAG, we propose both training-free and training-based approaches. We first showcase the effectiveness of retrieval reordering as a simple yet powerful training-free optimization. Furthermore, we explore training-based methods, specifically RAG-specific implicit LLM fine-tuning and RAG-oriented fine-tuning with intermediate reasoning, demonstrating their capacity for substantial performance gains. Finally, we conduct a systematic analysis of design choices for these training-based methods, including data distribution, retriever selection, and training context length.