AITopics | level 5

Collaborating Authors

level 5

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Rethinking RL Evaluation: Can Benchmarks Truly Reveal Failures of RL Methods?

Chen, Zihan, Zhang, Yiming, Zhou, Hengguang, Ding, Zenghui, Sun, Yining, Hsieh, Cho-Jui

arXiv.org Artificial IntelligenceOct-14-2025

Reinforcement Learning (RL) has emerged as a powerful paradigm for post-training Large Language Models (LLMs), significantly enhancing their capabilities on complex, multi-step reasoning tasks (Ouyang et al., 2022). Methods based on Reinforcement Learning from Human Feedback (RLHF) and Direct Preference Optimization (DPO) (Rafailov et al., 2023) have become standard practice for aligning LLMs. These paradigms are often powered by foundational algorithms like Proximal Policy Optimization (PPO) (Schulman et al., 2017), with state-of-the-art variants such as Group Relative Policy Optimization (GRPO) (Shao et al., 2024) pushing models to achieve remarkable performance on benchmarks like GSM8K (Cobbe et al., 2021) and MA TH (Hendrycks et al., 2021). These successes, often marked by state-of-the-art results (Lewkowycz et al., 2022; Lightman et al., 2023), are widely interpreted as a significant leap forward, suggesting that RL-based alignment is a key pathway toward developing more general and robust machine reasoning systems. Despite impressive reported gains, a key question is whether current benchmarks still meaningfully assess generalization. Our findings suggest that the traditional assumption underlying benchmark design, that a model's ability to perform well on unseen test examples is sufficient to measure generalization, no longer holds for RL. We find that RL-based reasoning models trained on the training split achieve nearly the same performance as those trained directly on the test split, indicating that "unseen-ness" alone is no longer the challenging or discriminative criterion. This calls for the rethinking of evaluation: rather than relying solely on disjoint train/test splits, future benchmarks must incorporate settings that remain sensitive to deeper forms of generalization and can reveal weaknesses that simple data separation fails to expose. To systematically investigate this phenomenon, we introduce a multi-faceted empirical framework designed not merely to measure performance, but to deconstruct it.

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2510.10541

Country:

North America > United States > California (0.28)
Asia (0.28)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Maslow-Inspired Hierarchy of Engagement with AI Model

Ogot, Madara

arXiv.org Artificial IntelligenceSep-10-2025

The rapid proliferation of artificial intelligence (AI) across industry, government, and education highlights the urgent need for robust frameworks to conceptualise and guide engagement. This paper introduces the Hierarchy of Engagement with AI model, a novel maturity framework inspired by Maslow's hierarchy of needs. The model conceptualises AI adoption as a progression through eight levels, beginning with initial exposure and basic understanding and culminating in ecosystem collaboration and societal impact. Each level integrates technical, organisational, and ethical dimensions, emphasising that AI maturity is not only a matter of infrastructure and capability but also of trust, governance, and responsibility. Initial validation of the model using four diverse case studies (General Motors, the Government of Estonia, the University of Texas System, and the African Union AI Strategy) demonstrate the model's contextual flexibility across various sectors. The model provides scholars with a framework for analysing AI maturity and offers practitioners and policymakers a diagnostic and strategic planning tool to guide responsible and sustainable AI engagement. The proposed model demonstrates that AI maturity progression is multi-dimensional, requiring technological capability, ethical integrity, organisational resilience, and ecosystem collaboration.

governance, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2509.07032

Country:

Europe (1.00)
Africa (0.88)
North America > United States > Texas (0.36)

Genre: Instructional Material > Course Syllabus & Notes (0.92)

Industry:

Transportation > Ground > Road (1.00)
Information Technology > Security & Privacy (0.93)
Government > Regional Government > Europe Government > Estonia Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(3 more...)

Add feedback

Evaluating Hierarchical Clinical Document Classification Using Reasoning-Based LLMs

Mustafa, Akram, Naseem, Usman, Azghadi, Mostafa Rahimi

arXiv.org Artificial IntelligenceJul-8-2025

Background: Clinical coding, particularly the classification of hierarchical ICD-10 codes from unstructured discharge summaries, is essential for healthcare operations, but remains a labor-intensive and error-prone task. Automated approaches using Large Language Models (LLMs) offer the potential to augment or replace human coders, yet their reliability and reasoning capabilities, which is needed to ensure accurate, explainable code assignments, are not well understood. Objective: This study aims to benchmark a diverse set of LLMs, both reasoning and non-reasoning models, on their ability to classify hierarchical ICD-10 codes from discharge summaries and evaluate the effect of structured reasoning on model performance. Methods: Using the MIMIC-IV dataset, the study selected 1,500 discharge summaries labeled with the top 10 most frequent ICD-10 codes, balancing dataset size with the high computational and financial cost of using LLMs. We first preprocessed the data to extract clinically relevant tokens before feeding it to the LLMs. Specifically, we used cTAKES, a clinical NLP tool, to identify medical concepts. Each summary was encoded and submitted to 11 LLMs using a standardized, structured prompt simulating a clinical coder. Models were evaluated using the F1 score across three ICD-10 levels for both primary and all diagnoses classification tasks. Reasoning models on average outperformed non-reasoning models. The Gemini 2.5 Pro model demonstrated the highest performance across tasks.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2507.03001

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.93)
Health & Medicine > Health Care Providers & Services (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Learning to Stop Overthinking at Test Time

Bao, Hieu Tran, Dat, Nguyen Cong, Anh, Nguyen Duc, Thanh-Tung, Hoang

arXiv.org Artificial IntelligenceFeb-17-2025

Test time scaling is currently one of the most active research areas that shows promise after training time scaling has reached its limits. Deep-thinking (DT) models are a class of recurrent models that can perform easy-to-hard generalization by assigning more compute to harder test samples. However, due to their inability to determine the complexity of a test sample, DT models have to use a large amount of computation for both easy and hard test samples. Excessive test time computation is wasteful and can cause the ``overthinking'' problem where more test time computation leads to worse results. In this paper, we introduce a test time training method for determining the optimal amount of computation needed for each sample during test time. We also propose Conv-LiGRU, a novel recurrent architecture for efficient and robust visual reasoning. Extensive experiments demonstrate that Conv-LiGRU is more stable than DT, effectively mitigates the ``overthinking'' phenomenon, and achieves superior accuracy.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2502.10954

Country: Asia > Vietnam (0.16)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Artificial Intelligence > Cognitive Science (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

OODFace: Benchmarking Robustness of Face Recognition under Common Corruptions and Appearance Variations

Kang, Caixin, Chen, Yubo, Ruan, Shouwei, Zhao, Shiji, Zhang, Ruochen, Wang, Jiayi, Fu, Shan, Wei, Xingxing

arXiv.org Artificial IntelligenceDec-3-2024

With the rise of deep learning, facial recognition technology has seen extensive research and rapid development. Although facial recognition is considered a mature technology, we find that existing open-source models and commercial algorithms lack robustness in certain real-world Out-of-Distribution (OOD) scenarios, raising concerns about the reliability of these systems. In this paper, we introduce OODFace, which explores the OOD challenges faced by facial recognition models from two perspectives: common corruptions and appearance variations. We systematically design 30 OOD scenarios across 9 major categories tailored for facial recognition. By simulating these challenges on public datasets, we establish three robustness benchmarks: LFW-C/V, CFP-FP-C/V, and YTF-C/V. We then conduct extensive experiments on 19 different facial recognition models and 3 commercial APIs, along with extended experiments on face masks, Vision-Language Models (VLMs), and defense strategies to assess their robustness. Based on the results, we draw several key insights, highlighting the vulnerability of facial recognition systems to OOD data and suggesting possible solutions. Additionally, we offer a unified toolkit that includes all corruption and variation types, easily extendable to other datasets. We hope that our benchmarks and findings can provide guidance for future improvements in facial recognition model robustness.

artificial intelligence, corruption, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2412.02479

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > China (0.04)

Genre: Research Report > New Finding (0.45)

Industry: Information Technology > Security & Privacy (0.45)

Technology:

Information Technology > Artificial Intelligence > Vision > Face Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Lost in Translation: Latent Concept Misalignment in Text-to-Image Diffusion Models

Zhao, Juntu, Deng, Junyu, Ye, Yixin, Li, Chongxuan, Deng, Zhijie, Wang, Dequan

arXiv.org Artificial IntelligenceAug-5-2024

Advancements in text-to-image diffusion models have broadened extensive downstream practical applications, but such models often encounter misalignment issues between text and image. Taking the generation of a combination of two disentangled concepts as an example, say given the prompt "a tea cup of iced coke", existing models usually generate a glass cup of iced coke because the iced coke usually co-occurs with the glass cup instead of the tea one during model training. The root of such misalignment is attributed to the confusion in the latent semantic space of text-to-image diffusion models, and hence we refer to the "a tea cup of iced coke" phenomenon as Latent Concept Misalignment (LC-Mis). We leverage large language models (LLMs) to thoroughly investigate the scope of LC-Mis, and develop an automated pipeline for aligning the latent semantics of diffusion models to text prompts. Empirical assessments confirm the effectiveness of our approach, substantially reducing LC-Mis errors and enhancing the robustness and versatility of text-to-image diffusion models. Our code and dataset have been available online for reference.

concept pair, latent concept misalignment, tea cup, (11 more...)

arXiv.org Artificial Intelligence

2408.0023

Country:

Asia > China > Shanghai > Shanghai (0.04)
North America > United States > New York (0.04)
North America > United States > California (0.04)
Asia > China > Shandong Province (0.04)

Genre: Research Report (1.00)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis > Beverages (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Text classification in shipping industry using unsupervised models and Transformer based supervised models

Xie, Ying, Song, Dongping

arXiv.org Artificial IntelligenceDec-21-2022

Obtaining labelled data in a particular context could be expensive and time consuming. Although different algorithms, including unsupervised learning, semi-supervised learning, self-learning have been adopted, the performance of text classification varies with context. Given the lack of labelled dataset, we proposed a novel and simple unsupervised text classification model to classify cargo content in international shipping industry using the Standard International Trade Classification (SITC) codes. Our method stems from representing words using pretrained Glove Word Embeddings and finding the most likely label using Cosine Similarity. To compare unsupervised text classification model with supervised classification, we also applied several Transformer models to classify cargo content. Due to lack of training data, the SITC numerical codes and the corresponding textual descriptions were used as training data. A small number of manually labelled cargo content data was used to evaluate the classification performances of the unsupervised classification and the Transformer based supervised classification. The comparison reveals that unsupervised classification significantly outperforms Transformer based supervised classification even after increasing the size of the training dataset by 30%. Lacking training data is a key bottleneck that prohibits deep learning models (such as Transformers) from successful practical applications. Unsupervised classification can provide an alternative efficient and effective method to classify text when there is scarce training data.

classification, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2212.12407

Country:

Europe > United Kingdom > England > Merseyside > Liverpool (0.04)
Europe > United Kingdom > England > Bedfordshire > Bedford (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.87)
Transportation > Freight & Logistics Services > Shipping (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

The different levels of autonomous vehicles - TechHQ

#artificialintelligenceDec-4-2022, 17:52:41 GMT

We live in a fast-moving world that not long ago would have been considered science fiction. One aspect of technology that has driven us from fantasy into reality is the emergence of autonomous vehicles. Also known as self-driving cars, autonomous vehicles still confuse many people. How does the technology work? And what do we actually mean by work in the context of self-driving cars?

autonomous vehicle, level 4, vehicle, (11 more...)

#artificialintelligence

Country: Asia > Japan (0.05)

Industry:

Automobiles & Trucks (1.00)
Transportation > Ground > Road (0.93)
Transportation > Passenger (0.74)
Information Technology > Robotics & Automation (0.59)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback

Motion Style Transfer: Modular Low-Rank Adaptation for Deep Motion Forecasting

Kothari, Parth, Li, Danya, Liu, Yuejiang, Alahi, Alexandre

arXiv.org Artificial IntelligenceNov-6-2022

Motion forecasting is an essential pillar for the successful deployment of autonomous systems in environments comprising various heterogeneous agents. It presents the challenges of modeling (i) universal etiquette (e.g., goal-directed behaviors, avoiding collisions) that govern general motion dynamics of all agents; and (ii) social norms (e.g., the minimum separation distance, preferred speed) that influence the navigation styles of different agents across different locations. Owing to the success of deep neural networks on large-scale datasets, learning prediction models in a data-driven manner has become a de-facto approach for motion forecasting and has shown impressive results [1, 2, 3, 4]. However, existing deep forecasting models suffer from inferior performance when they encounter novel scenarios [5, 6, 7, 8]. For instance, a network trained with large-scale data for pedestrian forecasting struggles to directly generalize to cyclists. Some recent methods propose to incorporate strong priors robust to the underlying distribution shifts [9, 10, 11]. Yet, these priors often make strong assumptions on the distribution shifts, which may not hold in practice.

artificial intelligence, deep learning, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2211.03165

Country: Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)

Genre: Research Report (0.50)

Industry:

Transportation > Ground > Road (0.93)
Transportation > Infrastructure & Services (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Will cars ever be fully autonomous?

#artificialintelligenceNov-5-2022, 07:55:40 GMT

Self-driving cars or autonomous vehicles are classified into various levels based on the level of automation built into them. Instead of a self-driving car, why not take the bus, you might ask. As you likely know, automated connected systems are no longer restricted to factories. They continue to percolate and expand in the daily thoroughfare of our lives. Gone are the days when owning and driving a car was a matter of privilege afforded by a select few.

automation, autonomous driving, vehicle, (16 more...)

#artificialintelligence

Country: Asia > Japan > Kyūshū & Okinawa > Kyūshū > Kagoshima Prefecture > Kagoshima (0.05)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)

Add feedback