AITopics | clap

Collaborating Authors

clap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

CLAP4CLIP: Continual Learning with Probabilistic Finetuning for Vision-Language Models

Neural Information Processing SystemsMar-22-2026, 19:09:13 GMT

Continual learning (CL) aims to help deep neural networks to learn new knowledge while retaining what has been learned. Owing to their powerful generalizability, pre-trained vision-language models such as Contrastive Language-Image Pre-training (CLIP) have lately gained traction as practical CL candidates. However, the domain mismatch between the pre-training and the downstream CL tasks calls for finetuning of the CLIP on the latter. The deterministic nature of the existing finetuning methods makes them overlook the many possible interactions across the modalities and deems them unsafe for high-risk tasks requiring reliable uncertainty estimation.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.59)

Add feedback

CLAPS: Posterior-Aware Conformal Intervals via Last-Layer Laplace

Kim, Dongseok, Choi, Hyoungsun, Rasool, Mohamed Jismy Aashik, Oh, Gisung

arXiv.org Machine LearningDec-10-2025

We present CLAPS, a posterior-aware conformal regression method that pairs a Last-Layer Laplace Approximation with split-conformal calibration. From the resulting Gaussian posterior, CLAPS defines a simple two-sided posterior CDF score that aligns the conformity metric with the full predictive shape, not just a point estimate. This alignment yields narrower prediction intervals at the same target coverage, especially on small to medium tabular datasets where data are scarce and uncertainty modeling matters. We also provide a lightweight diagnostic suite that separates aleatoric and epistemic components and visualizes posterior behavior, helping practitioners understand why intervals shrink when they do. Across multiple benchmarks using the same MLP backbone, CLAPS consistently attains nominal coverage with improved efficiency and minimal overhead, offering a clear, practical upgrade to residual-based conformal baselines.

calibration, clap, posterior-a ware conformal interval, (14 more...)

arXiv.org Machine Learning

2512.01384

Country: North America > United States (0.14)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

LUMA-RAG: Lifelong Multimodal Agents with Provably Stable Streaming Alignment

Wandre, Rohan, Gajewar, Yash, Patel, Namrata, Dhalkari, Vivek

arXiv.org Artificial IntelligenceNov-5-2025

Retrieval-Augmented Generation (RAG) has emerged as the dominant paradigm for grounding large language model outputs in verifiable evidence. However, as modern AI agents transition from static knowledge bases to continuous multimodal streams encompassing text, images, video, and audio, two critical challenges arise: maintaining index freshness without prohibitive re-indexing costs, and preserving cross-modal semantic consistency across heterogeneous embedding spaces. We present LUMA-RAG, a lifelong multimodal agent architecture featuring three key innovations: (i) a streaming, multi-tier memory system that dynamically spills embeddings from a hot HNSW tier to a compressed IVFPQ tier under strict memory budgets; (ii) a streaming CLAP->CLIP alignment bridge that maintains cross-modal consistency through incremental orthogonal Procrustes updates; and (iii) stability-aware retrieval telemetry providing Safe@k guarantees by jointly bounding alignment drift and quantization error. Experiments demonstrate robust text-to-image retrieval (Recall@10 = 0.94), graceful performance degradation under product quantization offloading, and provably stable audio-to-image rankings (Safe@1 = 1.0), establishing LUMA-RAG as a practical framework for production multimodal RAG systems.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2511.02371

Country: Asia > India > Maharashtra > Mumbai (0.05)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.35)

Add feedback

EMO-TTA: Improving Test-Time Adaptation of Audio-Language Models for Speech Emotion Recognition

Shi, Jiacheng, Du, Hongfei, Hong, Y. Alicia, Gao, Ye

arXiv.org Artificial IntelligenceOct-1-2025

Speech emotion recognition (SER) with audio-language models (ALMs) remains vulnerable to distribution shifts at test time, leading to performance degradation in out-of-domain scenarios. Test-time adaptation (TTA) provides a promising solution but often relies on gradient-based updates or prompt tuning, limiting flexibility and practicality. We propose Emo-TTA, a lightweight, training-free adaptation framework that incrementally updates class-conditional statistics via an Expectation-Maximization procedure for explicit test-time distribution estimation, using ALM predictions as priors. Emo-TTA operates on individual test samples without modifying model weights. Experiments on six out-of-domain SER benchmarks show consistent accuracy improvements over prior TTA baselines, demonstrating the effectiveness of statistical adaptation in aligning model predictions with evolving test distributions.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2509.25495

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.93)
Information Technology > Artificial Intelligence > Cognitive Science > Emotion (0.72)

Add feedback

A Generalization of CLAP from 3D Localization to Image Processing, A Connection With RANSAC & Hough Transforms

Hou, Ruochen, Fernandez, Gabriel I., Xu, Alex, Hong, Dennis W.

arXiv.org Artificial IntelligenceSep-18-2025

Abstract-- In previous work, we introduced a 2D localization algorithm called CLAP, Clustering to Localize Across n Possibilities, which was used during our championship win in RoboCup 2024, an international autonomous humanoid soccer competition. CLAP is particularly recognized for its robustness against outliers, where clustering is employed to suppress noise and mitigate against erroneous feature matches. This clustering-based strategy provides an alternative to traditional outlier rejection schemes such as RANSAC, in which candidates are validated by reprojection error across all data points. In this paper, CLAP is extended to a more general framework beyond 2D localization, specifically to 3D localization and image stitching. We also show how CLAP, RANSAC, and Hough transforms are related. The generalization of CLAP is widely applicable to many different fields and can be a useful tool to deal with noise and uncertainty.

artificial intelligence, clap, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2509.13605

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.83)

Industry: Leisure & Entertainment > Sports > Soccer (0.56)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.47)

Add feedback

Cross-Layer Attention Probing for Fine-Grained Hallucination Detection

Suresh, Malavika, Aljundi, Rahaf, Nkisi-Orji, Ikechukwu, Wiratunga, Nirmalie

arXiv.org Artificial IntelligenceSep-15-2025

With the large-scale adoption of Large Language Models (LLMs) in various applications, there is a growing reliability concern due to their tendency to generate inaccurate text, i.e. hallucinations. In this work, we propose Cross-Layer Attention Probing (CLAP), a novel activation probing technique for hallucination detection, which processes the LLM activations across the entire residual stream as a joint sequence. Our empirical evaluations using five LLMs and three tasks show that CLAP improves hallucination detection compared to baselines on both greedy decoded responses as well as responses sampled at higher temperatures, thus enabling fine-grained detection, i.e. the ability to disambiguate hallucinations and non-hallucinations among different sampled responses to a given prompt. This allows us to propose a detect-then-mitigate strategy using CLAP to reduce hallucinations and improve LLM reliability compared to direct mitigation approaches. Finally, we show that CLAP maintains high reliability even when applied out-of-distribution.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2509.097

Country: Europe > Italy (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

CLAP: Clustering to Localize Across n Possibilities, A Simple, Robust Geometric Approach in the Presence of Symmetries

Fernandez, Gabriel I., Hou, Ruochen, Xu, Alex, Togashi, Colin, Hong, Dennis W.

arXiv.org Artificial IntelligenceSep-11-2025

Abstract-- In this paper, we present our localization method called CLAP, Clustering to Localize Across n Possibilities, which helped us win the RoboCup 2024 adult-sized autonomous humanoid soccer competition. In addition, our robot had to deal with varying lighting conditions, dynamic feature occlusions, noise from high-impact stepping, and mistaken features from bystanders and neighboring fields. Therefore, we needed an accurate, and most importantly robust localization algorithm that would be the foundation for our path-planning and game-strategy algorithms. CLAP achieves these requirements by clustering estimated states of our robot from pairs of field features to localize its global position and orientation. Correct state estimates naturally cluster together, while incorrect estimates spread apart, making CLAP resilient to noise and incorrect inputs. CLAP is paired with a particle filter and an extended Kalman filter to improve consistency and smoothness. T ests of CLAP with other landmark-based localization methods showed similar accuracy. However, tests with increased false positive feature detection showed that CLAP outperformed other methods in terms of robustness with very little divergence and velocity jumps. Our localization performed well in competition, allowing our robot to shoot faraway goals and narrowly defend our goal. Every year, the Robocup Federation hosts a humanoid soccer competition in hopes of one day playing a live match of robots versus humans. To ensure a fair match, rules are put in place such that robots must be able to play autonomously, be of similar physiological proportions to a human, and only be equipped with sensors that have biological equivalents.

artificial intelligence, landmark, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.08495

Country: North America > United States > California > Los Angeles County > Los Angeles (0.28)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports > Soccer (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

CLAP: Coreference-Linked Augmentation for Passage Retrieval

Xu, Huanwei, Xu, Lin, Yuan, Liang

arXiv.org Artificial IntelligenceAug-26-2025

Large Language Model (LLM)-based passage expansion has shown promise for enhancing first-stage retrieval, but often underperforms with dense retrievers due to semantic drift and misalignment with their pretrained semantic space. Beyond this, only a portion of a passage is typically relevant to a query, while the rest introduces noise--an issue compounded by chunking techniques that break coreference continuity. We propose Coreference-Linked Augmentation for Passage Retrieval (CLAP), a lightweight LLM-based expansion framework that segments passages into coherent chunks, resolves coreference chains, and generates localized pseudo-queries aligned with dense retriever representations. A simple fusion of global topical signals and fine-grained subtopic signals achieves robust performance across domains. CLAP yields consistent gains even as retriever strength increases, enabling dense retrievers to match or surpass second-stage rankers such as BM25 + MonoT5-3B, with up to 20.68% absolute nDCG@10 improvement. These improvements are especially notable in out-of-domain settings, where conventional LLM-based expansion methods relying on domain knowledge often falter. CLAP instead adopts a logic-centric pipeline that enables robust, domain-agnostic generalization.

artificial intelligence, large language model, natural language, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1145/3746252.3761113

2508.06941

Country:

Asia (0.94)
North America > United States (0.94)
North America > Canada > British Columbia (0.28)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

CLaP -- State Detection from Time Series

Ermshaus, Arik, Schäfer, Patrick, Leser, Ulf

arXiv.org Artificial IntelligenceAug-26-2025

The ever-growing amount of sensor data from machines, smart devices, and the environment leads to an abundance of high-resolution, unannotated time series (TS). These recordings encode recognizable properties of latent states and transitions from physical phenomena that can be modelled as abstract processes. The unsupervised localization and identification of these states and their transitions is the task of time series state detection (TSSD). Current TSSD algorithms employ classical unsupervised learning techniques, to infer state membership directly from feature space. This limits their predictive power, compared to supervised learning methods, which can exploit additional label information. We introduce CLaP, a new, highly accurate and efficient algorithm for TSSD. It leverages the predictive power of time series classification for TSSD in an unsupervised setting by applying novel self-supervision techniques to detect whether data segments emerge from the same state. To this end, CLaP cross-validates a classifier with segment-labelled subsequences to quantify confusion between segments. It merges labels from segments with high confusion, representing the same latent state, if this leads to an increase in overall classification quality. We conducted an experimental evaluation using 405 TS from five benchmarks and found CLaP to be significantly more precise in detecting states than six state-of-the-art competitors. It achieves the best accuracy-runtime tradeoff and is scalable to large TS. We provide a Python implementation of CLaP, which can be deployed in TS analysis workflows.

artificial intelligence, inductive learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2504.01783

Country: Europe (0.93)

Genre: Research Report (1.00)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.54)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

DegDiT: Controllable Audio Generation with Dynamic Event Graph Guided Diffusion Transformer

Liu, Yisu, Li, Chenxing, Zhang, Wanqian, Wang, Wenfu, Yu, Meng, Fu, Ruibo, Lin, Zheng, Wang, Weiping, Yu, Dong

arXiv.org Artificial IntelligenceAug-20-2025

Controllable text-to-audio generation aims to synthesize audio from textual descriptions while satisfying user-specified constraints, including event types, temporal sequences, and onset and offset timestamps. This enables precise control over both the content and temporal structure of the generated audio. Despite recent progress, existing methods still face inherent trade-offs among accurate temporal localization, open-vocabulary scalability, and practical efficiency. To address these challenges, we propose DegDiT, a novel dynamic event graph-guided diffusion transformer framework for open-vocabulary controllable audio generation. DegDiT encodes the events in the description as structured dynamic graphs. The nodes in each graph are designed to represent three aspects: semantic features, temporal attributes, and inter-event connections. A graph transformer is employed to integrate these nodes and produce contextualized event embeddings that serve as guidance for the diffusion model. To ensure high-quality and diverse training data, we introduce a quality-balanced data selection pipeline that combines hierarchical event annotation with multi-criteria quality scoring, resulting in a curated dataset with semantic diversity. Furthermore, we present consensus preference optimization, facilitating audio generation through consensus among multiple reward signals. Extensive experiments on AudioCondition, DESED, and AudioTime datasets demonstrate that DegDiT achieves state-of-the-art performances across a variety of objective and subjective evaluation metrics.

diffusion model, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.13786

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.94)

Add feedback