AITopics

Large Language Models (LLMs) have gained enormous attention in recent years due to their capability of understanding and generating natural languages. With the rapid development and wild-range applications (e.g., Agents, Embodied Intelligence), the robustness of LLMs has received increased attention. As the core brain of many AI applications, the robustness of LLMs requires that models should not only generate consistent contents, but also ensure the correctness and stability of generated content when dealing with unexpeted application scenarios (e.g., toxic prompts, limited noise domain data, outof-distribution (OOD) applications, etc). In this survey paper, we conduct a thorough review of the robustness of LLMs, aiming to provide a comprehensive terminology of concepts and methods around this field and facilitate the community. Specifically, we first give a formal definition of LLM robustness and present the collection protocol of this survey paper. Then, based on the types of perturbated inputs, we organize this survey from the following perspectives: 1) Adversarial Robustness: tackling the problem that prompts are manipulated intentionally, such as noise prompts, long context, data attack, etc; 2) OOD Robustness: dealing with the unexpected real-world application scenarios, such as OOD detection, zero-shot transferring, hallucinations, etc; 3) Evaluation of Robustness: summarizing the new evaluation datasets, metrics, and tools for verifying the robustness of LLMs. After reviewing the representative work from each perspective, we discuss and highlight future opportunities and research directions in this field. Meanwhile, we also organize related works and provide an easy-to-search project (https://github.com/zhangkunzk/Awesome-LLM-Robustness-papers) to support the community.

large language model, machine learning, natural language, (15 more...)

2506.11111

Country: Asia > China > Anhui Province (0.14)

Genre: Overview (1.00)

Industry:

Education (1.00)
Information Technology > Security & Privacy (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Thu, Ye Kyaw, Aung, Thura, Oo, Thazin Myint, Supnithi, Thepchai

KAConvText: Novel Approach to Burmese Sentence Classification using Kolmogorov-Arnold Convolution

This paper presents the first application of Kolmogorov-Arnold Convolution for Text (KAConvText) in sentence classification, addressing three tasks: imbalanced binary hate speech detection, balanced multiclass news classification, and imbalanced multiclass ethnic language identification. We investigate various embedding configurations, comparing random to fastText embeddings in both static and fine-tuned settings, with embedding dimensions of 100 and 300 using CBOW and Skip-gram models. Baselines include standard CNNs and CNNs augmented with a Kolmogorov-Arnold Network (CNN-KAN). In addition, we investigated KAConvText with different classification heads - MLP and KAN, where using KAN head supports enhanced interpretability. Results show that KAConvText-MLP with fine-tuned fastText embeddings achieves the best performance of 91.23% accuracy (F1-score = 0.9109) for hate speech detection, 92.66% accuracy (F1-score = 0.9267) for news classification, and 99.82% accuracy (F1-score = 0.9982) for language identification.

f1-score, machine learning, natural language, (18 more...)

2507.06753

Country: Asia > Myanmar (0.16)

Genre:

Research Report > New Finding (0.48)
Research Report > Promising Solution (0.40)
Overview > Innovation (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Goal-Oriented Skill Abstraction for Offline Multi-Task Reinforcement Learning

He, Jinmin, Li, Kai, Zang, Yifan, Fu, Haobo, Fu, Qiang, Xing, Junliang, Cheng, Jian

Offline multi-task reinforcement learning aims to learn a unified policy capable of solving multiple tasks using only pre-collected task-mixed datasets, without requiring any online interaction with the environment. However, it faces significant challenges in effectively sharing knowledge across tasks. Inspired by the efficient knowledge abstraction observed in human learning, we propose Goal-Oriented Skill Abstraction (GO-Skill), a novel approach designed to extract and utilize reusable skills to enhance knowledge transfer and task performance. Our approach uncovers reusable skills through a goal-oriented skill extraction process and leverages vector quantization to construct a discrete skill library. To mitigate class imbalances between broadly applicable and task-specific skills, we introduce a skill enhancement phase to refine the extracted skills. Furthermore, we integrate these skills using hierarchical policy learning, enabling the construction of a high-level policy that dynamically orchestrates discrete skills to accomplish specific tasks. Extensive experiments on diverse robotic manipulation tasks within the MetaWorld benchmark demonstrate the effectiveness and versatility of GO-Skill.

artificial intelligence, machine learning, reinforcement learning, (11 more...)

2507.06628

Genre:

Research Report > Promising Solution (0.66)
Overview > Innovation (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Dehkordi, Ali Safarpoor, Zehmakan, Ahad N.

Graph-based Fake Account Detection: A Survey

In recent years, there has been a growing effort to develop effective and efficient algorithms for fake account detection in online social networks. This survey comprehensively reviews existing methods, with a focus on graph-based techniques that utilise topological features of social graphs (in addition to account information, such as their shared contents and profile data) to distinguish between fake and real accounts. We provide several categorisations of these methods (for example, based on techniques used, input data, and detection time), discuss their strengths and limitations, and explain how these methods connect in the broader context. We also investigate the available datasets, including both real-world data and synthesised models. We conclude the paper by proposing several potential avenues for future research.

fake account, large language model, machine learning, (20 more...)

2507.06541

Country:

Asia (1.00)
North America > United States (0.27)

Genre:

Research Report (1.00)
Overview (1.00)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Government > Voting & Elections (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(8 more...)

Taylor, Russell, Herbert, Benjamin, Sana, Michael

Pun Intended: Multi-Agent Translation of Wordplay with Contrastive Learning and Phonetic-Semantic Embeddings

Translating wordplay across languages presents unique challenges that have long confounded both professional human translators and machine translation systems. This research proposes a novel approach for translating puns from English to French by combining state-of-the-art large language models with specialized techniques for wordplay generation. Our methodology employs a three-stage approach. First, we establish a baseline using multiple frontier large language models with feedback based on a new contrastive learning dataset. Second, we implement a guided chain-of-thought pipeline with combined phonetic-semantic embeddings. Third, we implement a multi-agent generator-discriminator framework for evaluating and regenerating puns with feedback. Moving beyond the limitations of literal translation, our methodology's primary objective is to capture the linguistic creativity and humor of the source text wordplay, rather than simply duplicating its vocabulary. Our best runs earned first and second place in the CLEF JOKER 2025 Task 2 competition where they were evaluated manually by expert native French speakers. This research addresses a gap between translation studies and computational linguistics by implementing linguistically-informed techniques for wordplay translation, advancing our understanding of how language models can be leveraged to handle the complex interplay between semantic ambiguity, phonetic similarity, and the implicit cultural and linguistic awareness needed for successful humor.

large language model, machine learning, natural language, (20 more...)

2507.06506

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.48)
Overview > Innovation (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)

Joaquin, Ayrton San, Gipiškis, Rokas, Staufer, Leon, Gil, Ariel

Deprecating Benchmarks: Criteria and Framework

As frontier artificial intelligence (AI) models rapidly advance, benchmarks are integral to comparing different models and measuring their progress in different task-specific domains. However, there is a lack of guidance on when and how benchmarks should be deprecated once they cease to effectively perform their purpose. This risks benchmark scores over-valuing model capabilities, or worse, obscuring capabilities and safety-washing. Based on a review of benchmarking practices, we propose criteria to decide when to fully or partially deprecate benchmarks, and a framework for deprecating benchmarks. Our work aims to advance the state of benchmarking towards rigorous and quality evaluations, especially for frontier models, and our recommendations are aimed to benefit benchmark developers, benchmark users, AI governance actors (across governments, academia, and industry panels), and policy makers.

benchmark, large language model, machine learning, (20 more...)

2507.06434

Country:

Europe (1.00)
North America > United States (0.46)

Genre:

Overview (0.68)
Research Report (0.64)

Industry: Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.68)
Information Technology > Artificial Intelligence > Issues (0.68)

Kennington, Casey, Schlangen, David

Could the Road to Grounded, Neuro-symbolic AI be Paved with Words-as-Classifiers?

Formal, Distributional, and Grounded theories of computational semantics each have their uses and their drawbacks. There has been a shift to ground models of language by adding visual knowledge, and there has been a call to enrich models of language with symbolic methods to gain the benefits from formal, distributional, and grounded theories. In this paper, we attempt to make the case that one potential path forward in unifying all three semantic fields is paved with the words-as-classifier model, a model of word-level grounded semantics that has been incorporated into formalisms and distributional language models in the literature, and it has been well-tested within interactive dialogue settings. We review that literature, motivate the words-as-classifiers model with an appeal to recent work in cognitive science, and describe a small experiment. Finally, we sketch a model of semantics unified through words-as-classifiers.

classifier, machine learning, natural language, (21 more...)

2507.06335

Country:

Europe (0.68)
North America > United States > California (0.28)

Genre:

Overview (1.00)
Research Report (0.83)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
(3 more...)

Rassul, Yassin Hussein, Ahmed, Aram M., Fattah, Polla, Hassan, Bryar A., Abdulkareem, Arwaa W., Rashid, Tarik A., Lu, Joan

Advancing Offline Handwritten Text Recognition: A Systematic Review of Data Augmentation and Generation Techniques

Offline Handwritten Text Recognition (HTR) systems play a crucial role in applications such as historical document digitization, automatic form processing, and biometric authentication. However, their performance is often hindered by the limited availability of annotated training data, particularly for low-resource languages and complex scripts. This paper presents a comprehensive survey of offline handwritten data augmentation and generation techniques designed to improve the accuracy and robustness of HTR systems. We systematically examine traditional augmentation methods alongside recent advances in deep learning, including Generative Adversarial Networks (GANs), diffusion models, and transformer-based approaches. Furthermore, we explore the challenges associated with generating diverse and realistic handwriting samples, particularly in preserving script authenticity and addressing data scarcity. This survey follows the PRISMA methodology, ensuring a structured and rigorous selection process. Our analysis began with 1,302 primary studies, which were filtered down to 848 after removing duplicates, drawing from key academic sources such as IEEE Digital Library, Springer Link, Science Direct, and ACM Digital Library. By evaluating existing datasets, assessment metrics, and state-of-the-art methodologies, this survey identifies key research gaps and proposes future directions to advance the field of handwritten text generation across diverse linguistic and stylistic landscapes.

artificial intelligence, handwriting recognition, machine learning, (13 more...)

2507.06275

Country:

Europe (1.00)
Asia > Middle East > Iraq (0.46)
Asia > India (0.46)
Asia > China (0.28)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Vision > Handwriting Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceJul-9-2025

Motion Generation: A Survey of Generative Approaches and Benchmarks

Khani, Aliasghar, Rampini, Arianna, Roy, Bruno, Nadela, Larasika, Kaplan, Noa, Atherton, Evan, Cheung, Derek, Bibliowicz, Jacky

Motion generation, the task of synthesizing realistic motion sequences from various conditioning inputs, has become a central problem in computer vision, computer graphics, and robotics, with applications ranging from animation and virtual agents to human-robot interaction. As the field has rapidly progressed with the introduction of diverse modeling paradigms including GANs, autoencoders, autoregressive models, and diffusion-based techniques, each approach brings its own advantages and limitations. This growing diversity has created a need for a comprehensive and structured review that specifically examines recent developments from the perspective of the generative approach employed. In this survey, we provide an in-depth categorization of motion generation methods based on their underlying generative strategies. Our main focus is on papers published in top-tier venues since 2023, reflecting the most recent advancements in the field. In addition, we analyze architectural principles, conditioning mechanisms, and generation settings, and compile a detailed overview of the evaluation metrics and datasets used across the literature. Our objective is to enable clearer comparisons and identify open challenges, thereby offering a timely and foundational reference for researchers and practitioners navigating the rapidly evolving landscape of motion generation.

computer vision, large language model, machine learning, (19 more...)

2507.05419

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Saudi Arabia > Northern Borders Province > Arar (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)

Genre: Overview (1.00)

Industry:

Leisure & Entertainment (0.67)
Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

arXiv.org Artificial IntelligenceJul-9-2025

Dynamic Context-Aware Prompt Recommendation for Domain-Specific AI Applications

Tang, Xinye, Zhai, Haijun, Belwal, Chaitanya, Thayanithi, Vineeth, Baumann, Philip, Roy, Yogesh K

LLM-powered applications are highly susceptible to the quality of user prompts, and crafting high-quality prompts can often be challenging especially for domain-specific applications. This paper presents a novel dynamic context-aware prompt recommendation system for domain-specific AI applications. Our solution combines contextual query analysis, retrieval-augmented knowledge grounding, hierarchical skill organization, and adaptive skill ranking to generate relevant and actionable prompt suggestions. The system leverages behavioral telemetry and a two-stage hierarchical reasoning process to dynamically select and rank relevant skills, and synthesizes prompts using both predefined and adaptive templates enhanced with few-shot learning. Experiments on real-world datasets demonstrate that our approach achieves high usefulness and relevance, as validated by both automated and expert evaluations.

data mining, large language model, machine learning, (18 more...)

2506.20815

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > Canada > Quebec > Montreal (0.04)
(4 more...)

Genre:

Overview (0.93)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.51)