AITopics | interm

Collaborating Authors

interm

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AURA: A Diagnostic Framework for Tracking User Satisfaction of Interactive Planning Agents

Kim, Takyoung, Singh, Janvijay, Mehri, Shuhaib, Acikgoz, Emre Can, Mukherjee, Sagnik, Bozdag, Nimet Beyza, Shashidhar, Sumuk, Tur, Gokhan, Hakkani-Tür, Dilek

arXiv.org Artificial IntelligenceDec-8-2025

The growing capabilities of large language models (LLMs) in instruction-following and context-understanding lead to the era of agents with numerous applications. Among these, task planning agents have become especially prominent in realistic scenarios involving complex internal pipelines, such as context understanding, tool management, and response generation. However, existing benchmarks predominantly evaluate agent performance based on task completion as a proxy for overall effectiveness. We hypothesize that merely improving task completion is misaligned with maximizing user satisfaction, as users interact with the entire agentic process and not only the end result. To address this gap, we propose AURA, an Agent-User inteRaction Assessment framework that conceptualizes the behavioral stages of interactive task planning agents. AURA offers a comprehensive assessment of agent through a set of atomic LLM evaluation criteria, allowing researchers and practitioners to diagnose specific strengths and weaknesses within the agent's decision-making pipeline. Our analyses show that agents excel in different behavioral stages, with user satisfaction shaped by both outcomes and intermediate behaviors. We also highlight future directions, including systems that leverage multiple agents and the limitations of user simulators in task planning.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2505.01592

Country:

North America > United States (0.93)
Europe (0.93)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.96)
(2 more...)

Add feedback

One-shot domain adaptation in video-based assessment of surgical skills

Yanik, Erim, Schwaitzberg, Steven, Yang, Gene, Intes, Xavier, De, Suvranu

arXiv.org Artificial IntelligenceDec-27-2023

Deep Learning (DL) has achieved automatic and objective assessment of surgical skills. However, the applicability of DL models is often hampered by their substantial data requirements and confinement to specific training domains. This prevents them from transitioning to new tasks with scarce data. Therefore, domain adaptation emerges as a critical element for the practical implementation of DL in real-world scenarios. Herein, we introduce A-VBANet, a novel meta-learning model capable of delivering domain-agnostic surgical skill classification via one-shot learning. A-VBANet has been rigorously developed and tested on five diverse laparoscopic and robotic surgical simulators. Furthermore, we extend its validation to operating room (OR) videos of laparoscopic cholecystectomy. Our model successfully adapts with accuracies up to 99.5% in one-shot and 99.9% in few-shot settings for simulated tasks and 89.7% for laparoscopic cholecystectomy. This research marks the first instance of a domain-agnostic methodology for surgical skill assessment, paving the way for more precise and accessible training evaluation across diverse high-stakes environments such as real-life surgery where data is scarce.

assessment, novice 0, unsatisfactory 1, (15 more...)

arXiv.org Artificial Intelligence

2301.00812

Country: North America > United States > New York > Erie County > Buffalo (0.04)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Surgery (1.00)
Health & Medicine > Diagnostic Medicine > Imaging (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Sharpness-Aware Minimization Leads to Low-Rank Features

Andriushchenko, Maksym, Bahri, Dara, Mobahi, Hossein, Flammarion, Nicolas

arXiv.org Artificial IntelligenceOct-28-2023

Sharpness-aware minimization (SAM) is a recently proposed method that minimizes the sharpness of the training loss of a neural network. While its generalization improvement is well-known and is the primary motivation, we uncover an additional intriguing effect of SAM: reduction of the feature rank which happens at different layers of a neural network. We show that this low-rank effect occurs very broadly: for different architectures such as fully-connected networks, convolutional networks, vision transformers and for different objectives such as regression, classification, language-image contrastive training. To better understand this phenomenon, we provide a mechanistic understanding of how low-rank features arise in a simple two-layer network. We observe that a significant number of activations gets entirely pruned by SAM which directly contributes to the rank reduction. We confirm this effect theoretically and check that it can also occur in deep networks, although the overall rank reduction mechanism can be more complex, especially for deep networks with pre-activation skip connections and self-attention layers. We make our code available at https://github.com/tml-epfl/sam-low-rank-features.

feature rank, generalization, interm, (14 more...)

arXiv.org Artificial Intelligence

2305.16292

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

An Empirical Study of Pre-Trained Model Reuse in the Hugging Face Deep Learning Model Registry

Jiang, Wenxin, Synovic, Nicholas, Hyatt, Matt, Schorlemmer, Taylor R., Sethi, Rohan, Lu, Yung-Hsiang, Thiruvathukal, George K., Davis, James C.

arXiv.org Artificial IntelligenceMar-4-2023

Deep Neural Networks (DNNs) are being adopted as components in software systems. Creating and specializing DNNs from scratch has grown increasingly difficult as state-of-the-art architectures grow more complex. Following the path of traditional software engineering, machine learning engineers have begun to reuse large-scale pre-trained models (PTMs) and fine-tune these models for downstream tasks. Prior works have studied reuse practices for traditional software packages to guide software engineers towards better package maintenance and dependency management. We lack a similar foundation of knowledge to guide behaviors in pre-trained model ecosystems. In this work, we present the first empirical investigation of PTM reuse. We interviewed 12 practitioners from the most popular PTM ecosystem, Hugging Face, to learn the practices and challenges of PTM reuse. From this data, we model the decision-making process for PTM reuse. Based on the identified practices, we describe useful attributes for model reuse, including provenance, reproducibility, and portability. Three challenges for PTM reuse are missing attributes, discrepancies between claimed and actual performance, and model risks. We substantiate these identified challenges with systematic measurements in the Hugging Face ecosystem. Our work informs future directions on optimizing deep learning ecosystems by automated measuring useful attributes and potential attacks, and envision future research on infrastructure and standardization for model registries.

artificial intelligence, machine learning, registry, (19 more...)

arXiv.org Artificial Intelligence

2303.02552

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > California > Los Angeles County > Los Angeles (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre:

Overview (1.00)
Research Report > New Finding (0.47)
Research Report > Experimental Study (0.46)
Personal > Interview (0.46)

Industry:

Information Technology > Security & Privacy (1.00)
Education (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback