AITopics | confidence 0

Collaborating Authors

confidence 0

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

VICoT-Agent: A Vision-Interleaved Chain-of-Thought Framework for Interpretable Multimodal Reasoning and Scalable Remote Sensing Analysis

Wang, Chujie, Luo, Zhiyuan, Liu, Ruiqi, Ran, Can, Fan, Shenghua, Chen, Xi, He, Chu

arXiv.org Artificial IntelligenceDec-4-2025

The current remote sensing image analysis task is increasingly evolving from traditional object recognition to complex intelligence reasoning, which places higher requirements on the model's reasoning ability and the flexibility of tool invocation. T o this end, we propose a new multimodal agent framework, Vision-Interleaved Chain-of-Thought Framework (VICoT), which implements explicit multi-round reasoning by dynamically incorporating visual tools into the chain of thought. Through a stack-based reasoning structure and a modular MCP-compatible tool suite, VICoT enables LLMs to efficiently perform multi-round, interleaved vision-language reasoning tasks with strong generalization and flexibility.W e also propose the Reasoning Stack distillation method to migrate complex Agent behaviors to small, lightweight models, which ensures the reasoning capability while significantly reducing complexity. Experiments on multiple remote sensing benchmarks demonstrate that VICoT significantly outperforms existing SOTA frameworks in reasoning transparency, execution efficiency, and generation quality.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2511.20085

Country: North America > United States (1.00)

Genre: Research Report (0.82)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Government > Military > Navy (1.00)
Transportation (0.96)
(2 more...)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
(3 more...)

Add feedback

EXP-CAM: Explanation Generation and Circuit Discovery Using Classifier Activation Matching

Suhail, Pirzada, Anand, Aditya, Sethi, Amit

arXiv.org Artificial IntelligenceDec-2-2025

Machine learning models, by virtue of training, learn a large repertoire of decision rules for any given input, and any one of these may suffice to justify a prediction. However, in high-dimensional input spaces, such rules are difficult to identify and interpret. In this paper, we introduce EXP-CAM: an explanation generation and circuit discovery approach using Classifier Activation Matching. EXP-CAM can generate minimal and faithful explanations for the decisions of pre-trained image classifiers that not only preserve the model's decision but are also concise and human-readable. We aim to identify minimal explanations that not only preserve the model's decision but are also concise and human-readable. To achieve this, we train a lightweight auto-encoder to produce binary masks that learns to highlight the decision-wise critical regions of an image while discarding irrelevant background. The training objective integrates activation alignment across multiple layers, consistency at the output label, priors that encourage sparsity, and compactness, along with a robustness constraint that enforces faithfulness. The minimal explanations so generated also lead us to mechanistically interpreting the model internals. In this regard we also introduce a circuit readout procedure wherein using the explanation's forward pass and gradients, we identify active channels and construct a channel-level graph, scoring inter-layer edges by ingress weight magnitude times source activation and feature-to-class links by classifier weight magnitude times feature activation. Together, these contributions provide a practical bridge between minimal input-level explanations and a mechanistic understanding of the internal computations driving model decisions.

explanation, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.25686

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Explanation & Argumentation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

\emph{FoQuS}: A Forgetting-Quality Coreset Selection Framework for Automatic Modulation Recognition

Lu, Yao, Sun, Chunfeng, Xu, Dongwei, Lin, Yun, Xuan, Qi, Gui, Guan

arXiv.org Artificial IntelligenceSep-11-2025

Deep learning-based Automatic Modulation Recognition (AMR) model has made significant progress with the support of large-scale labeled data. However, when developing new models or performing hyperparameter tuning, the time and energy consumption associated with repeated training using massive amounts of data are often unbearable. To address the above challenges, we propose \emph{FoQuS}, which approximates the effect of full training by selecting a coreset from the original dataset, thereby significantly reducing training overhead. Specifically, \emph{FoQuS} records the prediction trajectory of each sample during full-dataset training and constructs three importance metrics based on training dynamics. Experiments show that \emph{FoQuS} can maintain high recognition accuracy and good cross-architecture generalization on multiple AMR datasets using only 1\%-30\% of the original data.

artificial intelligence, machine learning, selection, (17 more...)

arXiv.org Artificial Intelligence

2509.083

Country: Asia > China (0.47)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

MICE for CATs: Model-Internal Confidence Estimation for Calibrating Agents with Tools

Subramani, Nishant, Eisner, Jason, Svegliato, Justin, Van Durme, Benjamin, Su, Yu, Thomson, Sam

arXiv.org Artificial IntelligenceApr-30-2025

Tool-using agents that act in the world need to be both useful and safe. Well-calibrated model confidences can be used to weigh the risk versus reward of potential actions, but prior work shows that many models are poorly calibrated. Inspired by interpretability literature exploring the internals of models, we propose a novel class of model-internal confidence estimators (MICE) to better assess confidence when calling tools. MICE first decodes from each intermediate layer of the language model using logitLens and then computes similarity scores between each layer's generation and the final output. These features are fed into a learned probabilistic classifier to assess confidence in the decoded output. On the simulated trial and error (STE) tool-calling dataset using Llama3 models, we find that MICE beats or matches the baselines on smoothed expected calibration error. Using MICE confidences to determine whether to call a tool significantly improves over strong baselines on a new metric, expected tool-calling utility. Further experiments show that MICE is sample-efficient, can generalize zero-shot to unseen APIs, and results in higher tool-calling utility in scenarios with varying risk levels. Our code is open source, available at https://github.com/microsoft/mice_for_cats.

computational linguistic, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2504.20168

Country:

Asia (1.00)
Europe (0.68)
North America > Mexico (0.28)

Genre:

Research Report > Experimental Study (0.68)
Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Sports (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.90)

Add feedback

From Evidence to Belief: A Bayesian Epistemology Approach to Language Models

Kim, Minsu, Kim, Sangryul, Thorne, James

arXiv.org Artificial IntelligenceApr-29-2025

This paper investigates the knowledge of language models from the perspective of Bayesian epistemology. We explore how language models adjust their confidence and responses when presented with evidence with varying levels of informativeness and reliability. To study these properties, we create a dataset with various types of evidence and analyze language models' responses and confidence using verbalized confidence, token probability, and sampling. We observed that language models do not consistently follow Bayesian epistemology: language models follow the Bayesian confirmation assumption well with true evidence but fail to adhere to other Bayesian assumptions when encountering different evidence types. Also, we demonstrated that language models can exhibit high confidence when given strong evidence, but this does not always guarantee high accuracy. Our analysis also reveals that language models are biased toward golden evidence and show varying performance depending on the degree of irrelevance, helping explain why they deviate from Bayesian assumptions.

artificial intelligence, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2504.19622

Country:

Europe (1.00)
Asia (0.67)
North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Health & Medicine (1.00)
Education (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.97)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.69)

Add feedback

VideoICL: Confidence-based Iterative In-context Learning for Out-of-Distribution Video Understanding

Kim, Kangsan, Park, Geon, Lee, Youngwan, Yeo, Woongyeong, Hwang, Sung Ju

arXiv.org Artificial IntelligenceDec-3-2024

Recent advancements in video large multimodal models (LMMs) have significantly improved their video understanding and reasoning capabilities. However, their performance drops on out-of-distribution (OOD) tasks that are underrepresented in training data. Traditional methods like fine-tuning on OOD datasets are impractical due to high computational costs. While In-context learning (ICL) with demonstration examples has shown promising generalization performance in language tasks and image-language tasks without fine-tuning, applying ICL to video-language tasks faces challenges due to the limited context length in Video LMMs, as videos require longer token lengths. To address these issues, we propose VideoICL, a novel video in-context learning framework for OOD tasks that introduces a similarity-based relevant example selection strategy and a confidence-based iterative inference approach. This allows to select the most relevant examples and rank them based on similarity, to be used for inference. If the generated response has low confidence, our framework selects new examples and performs inference again, iteratively refining the results until a high-confidence response is obtained. This approach improves OOD video understanding performance by extending effective context length without incurring high costs. The experimental results on multiple benchmarks demonstrate significant performance gains, especially in domain-specific scenarios, laying the groundwork for broader video comprehension applications. Code will be released at https://github.com/KangsanKim07/VideoICL

category, iteration, video, (16 more...)

arXiv.org Artificial Intelligence

2412.02186

Country:

Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Asia > Singapore (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Vision > Video Understanding (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Fairness without Sensitive Attributes via Knowledge Sharing

Ni, Hongliang, Han, Lei, Chen, Tong, Sadiq, Shazia, Demartini, Gianluca

arXiv.org Artificial IntelligenceSep-27-2024

While model fairness improvement has been explored previously, existing methods invariably rely on adjusting explicit sensitive attribute values in order to improve model fairness in downstream tasks. However, we observe a trend in which sensitive demographic information becomes inaccessible as public concerns around data privacy grow. In this paper, we propose a confidence-based hierarchical classifier structure called "Reckoner" for reliable fair model learning under the assumption of missing sensitive attributes. We first present results showing that if the dataset contains biased labels or other hidden biases, classifiers significantly increase the bias gap across different demographic groups in the subset with higher prediction confidence. Inspired by these findings, we devised a dual-model system in which a version of the model initialised with a high-confidence data subset learns from a version of the model initialised with a low-confidence data subset, enabling it to avoid biased predictions. Our experimental results show that Reckoner consistently outperforms state-of-the-art baselines in COMPAS dataset and New Adult dataset, considering both accuracy and fairness metrics.

classifier, dataset, fairness, (15 more...)

arXiv.org Artificial Intelligence

2409.1847

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.06)
Oceania > Australia > Queensland (0.05)
(4 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Law (0.93)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Add feedback

Large Language Models Must Be Taught to Know What They Don't Know

Kapoor, Sanyam, Gruver, Nate, Roberts, Manley, Collins, Katherine, Pal, Arka, Bhatt, Umang, Weller, Adrian, Dooley, Samuel, Goldblum, Micah, Wilson, Andrew Gordon

arXiv.org Machine LearningJun-12-2024

When using large language models (LLMs) in high-stakes applications, we need to know when we can trust their predictions. Some works argue that prompting high-performance LLMs is sufficient to produce calibrated uncertainties, while others introduce sampling methods that can be prohibitively expensive. In this work, we first argue that prompting on its own is insufficient to achieve good calibration and then show that fine-tuning on a small dataset of correct and incorrect answers can create an uncertainty estimate with good generalization and small computational overhead. We show that a thousand graded examples are sufficient to outperform baseline methods and that training through the features of a model is necessary for good performance and tractable for large open-source models when using LoRA. We also investigate the mechanisms that enable reliable LLM uncertainty estimation, finding that many models can be used as general-purpose uncertainty estimators, applicable not just to their own uncertainties but also the uncertainty of other models. Lastly, we show that uncertainty estimates inform human use of LLMs in human-AI collaborative settings through a user study.

calibration, language model, probability, (16 more...)

arXiv.org Machine Learning

2406.08391

Country:

North America > United States > New York (0.04)
North America > Canada > Ontario > Toronto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.92)

Industry: Education > Curriculum > Subject-Specific Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.97)

Add feedback

LabelBench: A Comprehensive Framework for Benchmarking Adaptive Label-Efficient Learning

Zhang, Jifan, Chen, Yifang, Canal, Gregory, Mussmann, Stephen, Das, Arnav M., Bhatt, Gantavya, Zhu, Yinglun, Bilmes, Jeffrey, Du, Simon Shaolei, Jamieson, Kevin, Nowak, Robert D

arXiv.org Artificial IntelligenceJan-12-2024

Labeled data are critical to modern machine learning applications, but obtaining labels can be expensive. To mitigate this cost, machine learning methods, such as transfer learning, semi-supervised learning and active learning, aim to be label-efficient: achieving high predictive performance from relatively few labeled examples. While obtaining the best label-efficiency in practice often requires combinations of these techniques, existing benchmark and evaluation frameworks do not capture a concerted combination of all such techniques. This paper addresses this deficiency by introducing LabelBench, a new computationally-efficient framework for joint evaluation of multiple label-efficient learning techniques. As an application of LabelBench, we introduce a novel benchmark of state-of-the-art active learning methods in combination with semi-supervised learning for fine-tuning pretrained vision transformers. Our benchmark demonstrates better label-efficiencies than previously reported in active learning. LabelBench's modular codebase is open-sourced for the broader community to contribute label-efficient learning methods and benchmarks. The repository can be found at: https://github.com/EfficientTraining/LabelBench.

accuracy, learning, pretrained model, (15 more...)

arXiv.org Artificial Intelligence

2306.0991

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Wisconsin > Dane County > Madison (0.14)
North America > United States > California > Riverside County > Riverside (0.14)
(5 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Unsupervised or Indirectly Supervised Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

An Empirical Study of Uncertainty Estimation Techniques for Detecting Drift in Data Streams

Winter, Anton, Jourdan, Nicolas, Wirth, Tristan, Knauthe, Volker, Kuijper, Arjan

arXiv.org Artificial IntelligenceNov-22-2023

In safety-critical domains such as autonomous driving and medical diagnosis, the reliability of machine learning models is crucial. One significant challenge to reliability is concept drift, which can cause model deterioration over time. Traditionally, drift detectors rely on true labels, which are often scarce and costly. This study conducts a comprehensive empirical evaluation of using uncertainty values as substitutes for error rates in detecting drifts, aiming to alleviate the reliance on labeled post-deployment data. We examine five uncertainty estimation methods in conjunction with the ADWIN detector across seven real-world datasets. Our results reveal that while the SWAG method exhibits superior calibration, the overall accuracy in detecting drifts is not notably impacted by the choice of uncertainty estimation method, with even the most basic method demonstrating competitive performance. These findings offer valuable insights into the practical applicability of uncertainty-based drift detection in real-world, safety-critical applications.

experiment, gap output, output ideal calibration gap output, (14 more...)

arXiv.org Artificial Intelligence

2311.13374

Country:

Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.05)
South America > Brazil > Maranhão (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report > New Finding (0.34)

Industry: Health & Medicine (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.49)

Add feedback