AITopics | aqe

Collaborating Authors

aqe

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Quantifying Self-Awareness of Knowledge in Large Language Models

Seo, Yeongbin, Lee, Dongha, Yeo, Jinyoung

arXiv.org Artificial IntelligenceSep-22-2025

Hallucination prediction in large language models (LLMs) is often interpreted as a sign of self-awareness. However, we argue that such performance can arise from question-side shortcuts rather than true model-side introspection. To disentangle these factors, we propose the Approximate Question-side Effect (AQE), which quantifies the contribution of question-awareness. Our analysis across multiple datasets reveals that much of the reported success stems from exploiting superficial patterns in questions. We further introduce SCAO (Semantic Compression by Answering in One word), a method that enhances the use of model-side signals. Experiments show that SCAO achieves strong and consistent performance, particularly in settings with reduced question-side cues, highlighting its effectiveness in fostering genuine self-awareness in LLMs.

large language model, machine learning, natural language, (16 more...)

arXiv.org Artificial Intelligence

2509.15339

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Aggressive Q-Learning with Ensembles: Achieving Both High Sample Efficiency and High Asymptotic Performance

Wu, Yanqiu, Chen, Xinyue, Wang, Che, Zhang, Yiming, Zhou, Zijian, Ross, Keith W.

arXiv.org Artificial IntelligenceNov-17-2021

Recently, Truncated Quantile Critics (TQC), using distributional representation of critics, was shown to provide state-of-the-art asymptotic training performance on all environments from the MuJoCo continuous control benchmark suite. Also recently, Randomized Ensemble Double Q-Learning (REDQ), using a high updateto-data ratio and target randomization, was shown to achieve high sample efficiency that is competitive with state-of-the-art model-based methods. In this paper, we propose a novel model-free algorithm, Aggressive Q-Learning with Ensembles (AQE), which improves the sample-efficiency performance of REDQ and the asymptotic performance of TQC, thereby providing overall state-of-the-art performance during all stages of training. Moreover, AQE is very simple, requiring neither distributional representation of critics nor target randomization. Off-policy Deep Reinforcement Learning algorithms aim to improve sample efficiency by reusing past experience. A number of off-policy Deep RL algorithms have been proposed for control tasks with continuous state and action spaces, including Deep Deterministic Policy Gradient (DDPG), Twin Delayed DDPG (TD3) and Soft Actor Critic (SAC) (Lillicrap et al., 2016; Fujimoto et al., 2018; Haarnoja et al., 2018a;b). TD3 introduced clipped double-Q learning, and was shown to be significantly more sample efficient than popular on-policy methods for a wide range of MuJoCo benchmarks. Soft Actor Critic (SAC) has similar off-policy structures with clipped double-Q learning, but it also employs maximum entropy reinforcement learning. SAC was shown to provide excellent sample efficiency and asymptotic performance in a wide-range of MuJoCo environments, including the high-dimensional Humanoid environment for which both DDPG and TD3 perform poorly.

algorithm, aqe, normalized bias, (15 more...)

arXiv.org Artificial Intelligence

2111.09159

Country:

North America > United States > New York (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre: Research Report > Promising Solution (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback