Goto

Collaborating Authors

 query strategy




Active Slice Discovery in Large Language Models

Zhang, Minhui, Ijner, Prahar, Wald, Yoav, Creager, Elliot

arXiv.org Artificial Intelligence

Large Language Models (LLMs) often exhibit systematic errors on specific subsets of data, known as error slices. For instance, a slice can correspond to a certain demographic, where a model does poorly in identifying toxic comments regarding that demographic. Identifying error slices is crucial to understanding and improving models, but it is also challenging. An appealing approach to reduce the amount of manual annotation required is to actively group errors that are likely to belong to the same slice, while using limited access to an annotator to verify whether the chosen samples share the same pattern of model mistake. In this paper, we formalize this approach as Active Slice Discovery and explore it empirically on a problem of discovering human-defined slices in toxicity classification. We examine the efficacy of active slice discovery under different choices of feature representations and active learning algorithms. On several slices, we find that uncertainty-based active learning algorithms are most effective, achieving competitive accuracy using 2-10% of the available slice membership information, while significantly outperforming baselines.


TActiLE: Tiny Active LEarning for wearable devices

Pavan, Massimo, Galimberti, Claudio, Roveri, Manuel

arXiv.org Artificial Intelligence

Tiny Machine Learning (TinyML) algorithms have seen extensive use in recent years, enabling wearable devices to be not only connected but also genuinely intelligent by running machine learning (ML) computations directly on-device. Among such devices, smart glasses have particularly benefited from TinyML advancements. TinyML facilitates the on-device execution of the inference phase of ML algorithms on embedded and wearable devices, and more recently, it has expanded into On-device Learning (ODL), which allows both inference and learning phases to occur directly on the device. The application of ODL techniques to wearable devices is particularly compelling, as it enables the development of more personalized models that adapt based on the data of the user. However, one of the major challenges of ODL algorithms is the scarcity of labeled data collected on-device. In smart wearable contexts, requiring users to manually label large amounts of data is often impractical and could lead to user disengagement with the technology. To address this issue, this paper explores the application of Active Learning (AL) techniques, i.e., techniques that aim at minimizing the labeling effort, by actively selecting from a large quantity of unlabeled data only a small subset to be labeled and added to the training set of the algorithm. In particular, we propose TActiLE, a novel AL algorithm that selects from the stream of on-device sensor data the ones that would help the ML algorithm improve the most once coupled with labels provided by the user. TActiLE is the first Active Learning technique specifically designed for the TinyML context. We evaluate its effectiveness and efficiency through experiments on multiple image classification datasets. The results demonstrate its suitability for tiny and wearable devices.


WaveFuse-AL: Cyclical and Performance-Adaptive Multi-Strategy Active Learning for Medical Images

Thakur, Nishchala, Kochhar, Swati, Bathula, Deepti R., Gupta, Sukrit

arXiv.org Artificial Intelligence

Active learning reduces annotation costs in medical imaging by strategically selecting the most informative samples for labeling. However, individual acquisition strategies often exhibit inconsistent behavior across different stages of the active learning cycle. We propose Cyclical and Performance-Adaptive Multi-Strategy Active Learning (WaveFuse-AL), a novel framework that adaptively fuses multiple established acquisition strategies-BALD, BADGE, Entropy, and CoreSet throughout the learning process. WaveFuse-AL integrates cyclical (sinusoidal) temporal priors with performance-driven adaptation to dynamically adjust strategy importance over time. We evaluate WaveFuse-AL on three medical imaging benchmarks: APTOS-2019 (multi-class classification), RSNA Pneumonia Detection (binary classification), and ISIC-2018 (skin lesion segmentation). Experimental results demonstrate that WaveFuse-AL consistently outperforms both single-strategy and alternating-strategy baselines, achieving statistically significant performance improvements (on ten out of twelve metric measurements) while maximizing the utility of limited annotation budgets.


DQS: A Low-Budget Query Strategy for Enhancing Unsupervised Data-driven Anomaly Detection Approaches

Correia, Lucas, Goos, Jan-Christoph, Bäck, Thomas, Kononova, Anna V.

arXiv.org Artificial Intelligence

Truly unsupervised approaches for time series anomaly detection are rare in the literature. Those that exist suffer from a poorly set threshold, which hampers detection performance, while others, despite claiming to be unsupervised, need to be calibrated using a labelled data subset, which is often not available in the real world. This work integrates active learning with an existing unsupervised anomaly detection method by selectively querying the labels of multivariate time series, which are then used to refine the threshold selection process. To achieve this, we introduce a novel query strategy called the dissimilarity-based query strategy (DQS). DQS aims to maximise the diversity of queried samples by evaluating the similarity between anomaly scores using dynamic time warping. We assess the detection performance of DQS in comparison to other query strategies and explore the impact of mislabelling, a topic that is underexplored in the literature. Our findings indicate that DQS performs best in small-budget scenarios, though the others appear to be more robust when faced with mislabelling. Therefore, in the real world, the choice of query strategy depends on the expertise of the oracle and the number of samples they are willing to label. Regardless, all query strategies outperform the unsupervised threshold even in the presence of mislabelling. Thus, whenever it is feasible to query an oracle, employing an active learning-based threshold is recommended.


Declarative Techniques for NL Queries over Heterogeneous Data

Khabiri, Elham, Kephart, Jeffrey O., Heath, Fenno F. III, Jayaraman, Srideepika, Tipu, Fateh A., Li, Yingjie, Shah, Dhruv, Fokoue, Achille, Bhamidipaty, Anu

arXiv.org Artificial Intelligence

In many industrial settings, users wish to ask questions in natural language, the answers to which require assembling information from diverse structured data sources. With the advent of Large Language Models (LLMs), applications can now translate natural language questions into a set of API calls or database calls, execute them, and combine the results into an appropriate natural language response. However, these applications remain impractical in realistic industrial settings because they do not cope with the data source heterogeneity that typifies such environments. In this work, we simulate the heterogeneity of real industry settings by introducing two extensions of the popular Spider benchmark dataset that require a combination of database and API calls. Then, we introduce a declarative approach to handling such data heterogeneity and demonstrate that it copes with data source heterogeneity significantly better than state-of-the-art LLM-based agentic or imperative code generation systems. Our augmented benchmarks are available to the research community.




Cold-Start Active Correlation Clustering

Aronsson, Linus, Wu, Han, Chehreghani, Morteza Haghir

arXiv.org Artificial Intelligence

We study active correlation clustering where pairwise similarities are not provided upfront and must be queried in a cost-efficient manner through active learning. Specifically, we focus on the cold-start scenario, where no true initial pairwise similarities are available for active learning. To address this challenge, we propose a coverage-aware method that encourages diversity early in the process. We demonstrate the effectiveness of our approach through several synthetic and real-world experiments.