AITopics | aqa

Collaborating Authors

aqa

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Model-Preserving Adaptive Rounding

Tseng, Albert, Sun, Zhaofeng, De Sa, Christopher

arXiv.org Artificial IntelligenceSep-29-2025

The goal of quantization is to produce a compressed model whose output distribution is as close to the original model's as possible. To do this tractably, most quantization algorithms minimize the immediate activation error of each layer as a proxy for the end-to-end error. However, this ignores the effect of future layers, making it a poor proxy. In this work, we introduce Yet Another Quantization Algorithm (YAQA), an adaptive rounding algorithm that directly considers the error at the network's output. YAQA introduces a series of theoretical results that culminate in the first end-to-end error bounds for quantization algorithms. First, we characterize the convergence time of adaptive rounding algorithms via the structure of their Hessian approximations. We then show that the end-to-end error can be bounded by the approximation's cosine similarity to the true Hessian. This admits a natural Kronecker-factored approximation with corresponding near-optimal Hessian sketches. YAQA is provably better than GPTQ/LDLQ and empirically reduces the error by $\approx 30\%$ over these methods. YAQA even achieves a lower error than quantization aware training. This translates to state of the art performance on downstream tasks, all while adding no inference overhead.

ctx, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2505.22988

Country: Europe (0.28)

Genre: Research Report (0.85)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.47)

Add feedback

EHWGesture -- A dataset for multimodal understanding of clinical gestures

Amprimo, Gianluca, Ancilotto, Alberto, Savino, Alessandro, Quazzolo, Fabio, Ferraris, Claudia, Olmo, Gabriella, Farella, Elisabetta, Di Carlo, Stefano

arXiv.org Artificial IntelligenceSep-10-2025

Hand gesture understanding is essential for several applications in human-computer interaction, including automatic clinical assessment of hand dexterity. While deep learning has advanced static gesture recognition, dynamic gesture understanding remains challenging due to complex spatiotemporal variations. Moreover, existing datasets often lack multimodal and multi-view diversity, precise ground-truth tracking, and an action quality component embedded within gestures. This paper introduces EHWGes-ture, a multimodal video dataset for gesture understanding featuring five clinically relevant gestures. It includes over 1,100 recordings ( 6 hours), captured from 25 healthy subjects using two high-resolution RGB-Depth cameras and an event camera. A motion capture system provides precise ground-truth hand landmark tracking, and all devices are spatially calibrated and synchronized to ensure cross-modal alignment. Moreover, to embed an action quality task within gesture understanding, collected recordings are organized in classes of execution speed that mirror clinical evaluations of hand dexterity. Baseline experiments highlight the dataset's potential for gesture classification, gesture trigger detection, and action quality assessment. Thus, EHWGes-ture can serve as a comprehensive benchmark for advancing multimodal clinical gesture understanding.

artificial intelligence, deep learning, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2509.07525

Country: Europe > Italy (0.28)

Genre:

Research Report > New Finding (0.46)
Research Report > Experimental Study (0.46)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Information Technology (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Decade of Action Quality Assessment: Largest Systematic Survey of Trends, Challenges, and Future Directions

Yin, Hao, Parmar, Paritosh, Xu, Daoliang, Zhang, Yang, Zheng, Tianyou, Fu, Weiwei

arXiv.org Artificial IntelligenceFeb-4-2025

Action Quality Assessment (AQA) -- the ability to quantify the quality of human motion, actions, or skill levels and provide feedback -- has far-reaching implications in areas such as low-cost physiotherapy, sports training, and workforce development. As such, it has become a critical field in computer vision & video understanding over the past decade. Significant progress has been made in AQA methodologies, datasets, & applications, yet a pressing need remains for a comprehensive synthesis of this rapidly evolving field. In this paper, we present a thorough survey of the AQA landscape, systematically reviewing over 200 research papers using the preferred reporting items for systematic reviews & meta-analyses (PRISMA) framework. We begin by covering foundational concepts & definitions, then move to general frameworks & performance metrics, & finally discuss the latest advances in methodologies & datasets. This survey provides a detailed analysis of research trends, performance comparisons, challenges, & future directions. Through this work, we aim to offer a valuable resource for both newcomers & experienced researchers, promoting further exploration & progress in AQA. Data are available at https://haoyin116.github.io/Survey_of_AQA/

assessment, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.02817

Country:

Asia > Japan > Shikoku > Ehime Prefecture > Matsuyama (0.04)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Asia > South Korea > Gangwon-do > Pyeongchang (0.04)
(3 more...)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment > Sports > Olympic Games (1.00)
Education (1.00)
Health & Medicine > Therapeutic Area (0.92)
Health & Medicine > Health Care Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(3 more...)

Add feedback

Audiopedia: Audio QA with Knowledge

Penamakuri, Abhirama Subramanyam, Chhatre, Kiran, Jain, Akshat

arXiv.org Artificial IntelligenceDec-29-2024

In this paper, we introduce Audiopedia, a novel task called Audio Question Answering with Knowledge, which requires both audio comprehension and external knowledge reasoning. Unlike traditional Audio Question Answering (AQA) benchmarks that focus on simple queries answerable from audio alone, Audiopedia targets knowledge-intensive questions. We define three sub-tasks: (i) Single Audio Question Answering (s-AQA), where questions are answered based on a single audio sample, (ii) Multi-Audio Question Answering (m-AQA), which requires reasoning over multiple audio samples, and (iii) Retrieval-Augmented Audio Question Answering (r-AQA), which involves retrieving relevant audio to answer the question. We benchmark large audio language models (LALMs) on these sub-tasks and observe suboptimal performance. To address this, we propose a generic framework that can be adapted to any LALM, equipping them with knowledge reasoning capabilities. Our framework has two components: (i) Audio Entity Linking (AEL) and (ii) Knowledge-Augmented Audio Large Multimodal Model (KA2LM), which together improve performance on knowledge-intensive AQA tasks. To our knowledge, this is the first work to address advanced audio understanding via knowledge-intensive tasks like Audiopedia.

artificial intelligence, natural language, question answering, (18 more...)

arXiv.org Artificial Intelligence

2412.20619

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)

Add feedback

Unmasking the Limits of Large Language Models: A Systematic Evaluation of Masked Text Processing Ability through MskQA and MskCal

Matsuzaki, Fuka, Sato, Haru-Tada

arXiv.org Artificial IntelligenceNov-8-2024

This paper sheds light on the limitations of Large Language Models (LLMs) by rigorously evaluating their ability to process masked text. We introduce two novel tasks: MskQA, measuring reasoning on masked question-answering datasets like RealtimeQA, and MskCal, assessing numerical reasoning on masked arithmetic problems.Testing GPT-4o and 4o-mini reveals that while LLMs exhibit some resilience to masked text, their performance is highly contingent on masking rates and semantic cues. Specifically, "solid masking," where semantic clues are entirely absent, leads to a significant performance drop compared to "partial lifting," where some semantic information is retained, indicating LLMs' reliance on surface-level patterns. Interestingly, GPT-4o consistently outperforms 4o-mini, particularly in MskCal, demonstrating a greater ability to handle numerical reasoning with masked text. This underscores the crucial role of semantic cues in the reasoning process of LLMs. Our study illuminates the interplay between background knowledge and reasoning ability in masked text processing, paving the way for a deeper understanding of LLM capabilities and limitations, and highlighting the need for more robust evaluation methods to accurately assess their true comprehension abilities.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.05665

Country:

North America > Dominican Republic (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.82)

Industry: Law (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.74)

Add feedback

AQA: Adaptive Question Answering in a Society of LLMs via Contextual Multi-Armed Bandit

Hoveyda, Mohanna, de Vries, Arjen P., de Rijke, Maarten, Oosterhuis, Harrie, Hasibi, Faegheh

arXiv.org Artificial IntelligenceSep-23-2024

In question answering (QA), different questions can be effectively addressed with different answering strategies. Some require a simple lookup, while others need complex, multi-step reasoning to be answered adequately. This observation motivates the development of a dynamic method that adaptively selects the most suitable QA strategy for each question, enabling more efficient and effective systems capable of addressing a broader range of question types. To this aim, we build on recent advances in the orchestration of multiple large language models (LLMs) and formulate adaptive QA as a dynamic orchestration challenge. We define this as a contextual multi-armed bandit problem, where the context is defined by the characteristics of the incoming question and the action space consists of potential communication graph configurations among the LLM agents. We then train a linear upper confidence bound model to learn an optimal mapping between different question types and their corresponding optimal multi-LLM communication graph representation. Our experiments show that the proposed solution is viable for adaptive orchestration of a QA system with multiple modules, as it combines the superior performance of more complex strategies while avoiding their costs when simpler strategies suffice.

agent, graph, orchestration, (15 more...)

arXiv.org Artificial Intelligence

2409.13447

Country:

Europe > Austria > Vienna (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(7 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Enhancing Temporal Understanding in Audio Question Answering for Large Audio Language Models

Sridhar, Arvind Krishna, Guo, Yinyi, Visser, Erik

arXiv.org Artificial IntelligenceSep-12-2024

The Audio Question Answering task includes audio event classification, audio captioning, and open ended reasoning. Recently, Audio Question Answering has garnered attention due to the advent of Large Audio Language Models. Current literature focuses on constructing LALMs by integrating audio encoders with text only Large Language Models through a projection module. While Large Audio Language Models excel in general audio understanding, they are limited in temporal reasoning which may hinder their commercial applications and on device deployment. This paper addresses these challenges and limitations in audio temporal reasoning. First, we introduce a data augmentation technique for generating reliable audio temporal questions and answers using an LLM. Second, we propose a continued finetuning curriculum learning strategy to specialize in temporal reasoning without compromising performance on finetuned tasks. Finally, we develop a reliable and transparent automated metric, assisted by an LLM, to measure the correlation between Large Audio Language Model responses and ground truth data intelligently. We demonstrate the effectiveness of our proposed techniques using SOTA LALMs on public audio benchmark datasets.

lalm, reasoning, temporal reasoning, (16 more...)

arXiv.org Artificial Intelligence

2409.06223

Country:

North America > United States > California > San Diego County > San Diego (0.05)
Asia > Thailand > Bangkok > Bangkok (0.05)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report (0.41)

Industry: Information Technology (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Joint Audio and Speech Understanding

Gong, Yuan, Liu, Alexander H., Luo, Hongyin, Karlinsky, Leonid, Glass, James

arXiv.org Artificial IntelligenceDec-10-2023

Humans are surrounded by audio signals that include both speech and non-speech sounds. The recognition and understanding of speech and non-speech audio events, along with a profound comprehension of the relationship between them, constitute fundamental cognitive capabilities. For the first time, we build a machine learning model, called LTU-AS, that has a conceptually similar universal audio perception and advanced reasoning ability. Specifically, by integrating Whisper as a perception module and LLaMA as a reasoning module, LTU-AS can simultaneously recognize and jointly understand spoken text, speech paralinguistics, and non-speech audio events - almost everything perceivable from audio signals.

dataset, information, ltu-as, (13 more...)

arXiv.org Artificial Intelligence

2309.14405

Country: North America > United States (0.04)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports > Basketball (0.46)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(2 more...)

Add feedback

Aesthetically Relevant Image Captioning

Zhong, Zhipeng, Zhou, Fei, Qiu, Guoping

arXiv.org Artificial IntelligenceNov-25-2022

Image aesthetic quality assessment (AQA) aims to assign numerical aesthetic ratings to images whilst image aesthetic captioning (IAC) aims to generate textual descriptions of the aesthetic aspects of images. In this paper, we study image AQA and IAC together and present a new IAC method termed Aesthetically Relevant Image Captioning (ARIC). Based on the observation that most textual comments of an image are about objects and their interactions rather than aspects of aesthetics, we first introduce the concept of Aesthetic Relevance Score (ARS) of a sentence and have developed a model to automatically label a sentence with its ARS. We then use the ARS to design the ARIC model which includes an ARS weighted IAC loss function and an ARS based diverse aesthetic caption selector (DACS). We present extensive experimental results to show the soundness of the ARS concept and the effectiveness of the ARIC model by demonstrating that texts with higher ARS's can predict the aesthetic ratings more accurately and that the new ARIC model can generate more accurate, aesthetically more relevant and more diverse image captions. Furthermore, a large new research database containing 510K images with over 5 million comments and 350K aesthetic scores, and code for implementing ARIC are available at https://github.com/PengZai/ARIC.

caption, data mining, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2211.15378

Country:

Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.14)
Asia > China > Guangdong Province > Shenzhen (0.05)
North America > Canada > Newfoundland and Labrador > Labrador (0.04)
(5 more...)

Genre: Research Report (0.64)

Industry:

Transportation (1.00)
Leisure & Entertainment > Sports (1.00)
Media > Photography (0.67)
Consumer Products & Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Data Science > Data Mining (0.67)

Add feedback

Google Trains Reinforcement Learning Agents to Ask the Right Questions

#artificialintelligenceMay-25-2021, 14:05:06 GMT

I recently started an AI-focused educational newsletter, that already has over 80,000 subscribers. TheSequence is a no-BS (meaning no hype, no news etc) ML-oriented newsletter that takes 5 minutes to read. The goal is to keep you up to date with machine learning projects, research papers and concepts. That paradigm assumes that the target knowledge is already embedded in the dataset and doesn't require any further clarifications but that rarely resembles how humans learn. When presented with a new subject, we are constantly forced to ask questions and clarifications about it.

answer selection model, aqa, google train reinforcement learning agent, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.88)

Add feedback