AITopics

2509.15517

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.45)

Dhakal, Abhiyan, Paudel, Kausik, Sigdel, Sanjog

An Artificial Intelligence Driven Semantic Similarity-Based Pipeline for Rapid Literature

arXiv.org Artificial IntelligenceSep-22-2025

We propose an automated pipeline for performing literature reviews using semantic similarity. Unlike traditional systematic review systems or optimization based methods, this work emphasizes minimal overhead and high relevance by using transformer based embeddings and cosine similarity. By providing a paper title and abstract, it generates relevant keywords, fetches relevant papers from open access repository, and ranks them based on their semantic closeness to the input. Three embedding models were evaluated. A statistical thresholding approach is then applied to filter relevant papers, enabling an effective literature review pipeline. Despite the absence of heuristic feedback or ground truth relevance labels, the proposed system shows promise as a scalable and practical tool for preliminary research and exploratory analysis.

large language model, machine learning, natural language, (18 more...)

2509.15292

Country:

North America > United States (0.28)
Asia > Middle East > UAE (0.28)

Genre:

Overview (1.00)
Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Lindes, Peter, Skiker, Kaoutar

Using Natural Language for Human-Robot Collaboration in the Real World

arXiv.org Artificial IntelligenceSep-22-2025

We have a vision of a day when autonomous robots can collaborate with humans as assistants in performing complex tasks in the physical world. This vision includes that the robots will have the ability to communicate with their human collaborators using language that is natural to the humans. Traditional Interactive Task Learning (ITL) systems have some of this ability, but the language they can understand is very limited. The advent of large language models (LLMs) provides an opportunity to greatly improve the language understanding of robots, yet integrating the language abilities of LLMs with robots that operate in the real physical world is a challenging problem. In this chapter we first review briefly a few commercial robot products that work closely with humans, and discuss how they could be much better collaborators with robust language abilities. We then explore how an AI system with a cognitive agent that controls a physical robot at its core, interacts with both a human and an LLM, and accumulates situational knowledge through its experiences, can be a possible approach to reach that vision. We focus on three specific challenges of having the robot understand natural language, and present a simple proof-of-concept experiment using ChatGPT for each. Finally, we discuss what it will take to turn these simple experiments into an operational system where LLM-assisted language understanding is a part of an integrated robotic assistant that uses language to collaborate with humans.

large language model, machine learning, natural language, (20 more...)

2508.11759

Country: North America > United States (0.46)

Genre:

Research Report (1.00)
Overview (0.87)

Industry: Health & Medicine > Therapeutic Area (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)

Communications of the ACMSep-19-2025, 14:04:06 GMT

The Landscape of Arabic Large Language Models

Membership in ACM includes a subscription to Communications of the ACM (CACM), the computing industry's most trusted source for staying connected to the world of advanced computing. The emergence of ChatGPT marked a transformative milestone for artificial intelligence (AI), showcasing the remarkable potential of large language models (LLMs) to generate human-like text. This wave of innovation has revolutionized how we interact with technology, seamlessly integrating LLMs into everyday tasks such as vacation planning, email drafting, and content creation. While English-speaking users have significantly benefited from these advancements, the Arabic world faces distinct challenges in developing Arabic-specific LLMs. Arabic, one of the languages spoken most widely around the world, serves more than 422 million native speakers in 27 countries and is deeply rooted in a rich linguistic and cultural heritage. Developing Arabic LLMs (ALLMs) presents an unparalleled opportunity to bridge technological gaps and empower communities. The journey of ALLMs has been both fascinating and complex, evolving from rudimentary text-processing systems to sophisticated AI-driven models. This article explores the trajectory of ALLMs, from their inception to the present day, highlighting the efforts to evaluate these models through benchmarks and public leaderboards.

allm, benchmark, computational linguistic, (14 more...)

Communications of the ACM

Country:

Asia > Middle East > Qatar (0.05)
Asia > Southeast Asia (0.04)
Asia > Middle East > Saudi Arabia (0.04)
(4 more...)

Genre: Overview (0.46)

Industry: Education (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

A Taxonomy of Prompt Defects in LLM Systems

Tian, Haoye, Wang, Chong, Yang, BoYang, Zhang, Lyuye, Liu, Yang

Large Language Models (LLMs) have become key components of modern software, with prompts acting as their de-facto programming interface. However, prompt design remains largely empirical and small mistakes can cascade into unreliable, insecure, or inefficient behavior. This paper presents the first systematic survey and taxonomy of prompt defects, recurring ways that prompts fail to elicit their intended behavior from LLMs. We organize defects along six dimensions: (1) Specification and Intent, (2) Input and Content, (3) Structure and Formatting, (4) Context and Memory, (5) Performance and Efficiency, and (6) Maintainability and Engineering. Each dimension is refined into fine-grained subtypes, illustrated with concrete examples and root cause analysis. Grounded in software engineering principles, we show how these defects surface in real development workflows and examine their downstream effects. For every subtype, we distill mitigation strategies that span emerging prompt engineering patterns, automated guardrails, testing harnesses, and evaluation frameworks. We then summarize these strategies in a master taxonomy that links defect, impact, and remedy. We conclude with open research challenges and a call for rigorous engineering-oriented methodologies to ensure that LLM-driven systems are dependable by design.

arxiv preprint arxiv, large language model, machine learning, (13 more...)

2509.14404

Country: Asia > China (0.14)

Genre: Overview (1.00)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Monitoring Machine Learning Systems: A Multivocal Literature Review

Naveed, Hira, Barnett, Scott, Arora, Chetan, Grundy, John, Khalajzadeh, Hourieh, Haggag, Omar

Context: Dynamic production environments make it challenging to maintain reliable machine learning (ML) systems. Runtime issues, such as changes in data patterns or operating contexts, that degrade model performance are a common occurrence in production settings. Monitoring enables early detection and mitigation of these runtime issues, helping maintain users' trust and prevent unwanted consequences for organizations. Aim: This study aims to provide a comprehensive overview of the ML monitoring literature. Method: We conducted a multivocal literature review (MLR) following the well established guidelines by Garousi to investigate various aspects of ML monitoring approaches in 136 papers. Results: We analyzed selected studies based on four key areas: (1) the motivations, goals, and context; (2) the monitored aspects, specific techniques, metrics, and tools; (3) the contributions and benefits; and (4) the current limitations. We also discuss several insights found in the studies, their implications, and recommendations for future research and practice. Conclusion: Our MLR identifies and summarizes ML monitoring practices and gaps, emphasizing similarities and disconnects between formal and gray literature. Our study is valuable for both academics and practitioners, as it helps select appropriate solutions, highlights limitations in current approaches, and provides future directions for research and tool development.

data mining, machine learning, reinforcement learning, (16 more...)

2509.14294

Country: Europe (1.00)

Genre:

Research Report > New Finding (1.00)
Overview (1.00)

Industry:

Information Technology > Services (1.00)
Information Technology > Security & Privacy (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
(2 more...)

Huber, Thomas, Niklaus, Christina

CLEAR: A Comprehensive Linguistic Evaluation of Argument Rewriting by Large Language Models

While LLMs have been extensively studied on general text generation tasks, there is less research on text rewriting, a task related to general text generation, and particularly on the behavior of models on this task. In this paper we analyze what changes LLMs make in a text rewriting setting. We focus specifically on argumentative texts and their improvement, a task named Argument Improvement (ArgImp). We present CLEAR: an evaluation pipeline consisting of 57 metrics mapped to four linguistic levels: lexical, syntactic, semantic and pragmatic. This pipeline is used to examine the qualities of LLM-rewritten arguments on a broad set of argumentation corpora and compare the behavior of different LLMs on this task and analyze the behavior of different LLMs on this task in terms of linguistic levels. By taking all four linguistic levels into consideration, we find that the models perform ArgImp by shortening the texts while simultaneously increasing average word length and merging sentences. Overall we note an increase in the persuasion and coherence dimensions.

computational linguistic, large language model, machine learning, (20 more...)

2509.15027

Country:

North America > United States (1.00)
Europe (1.00)
Asia > Middle East > UAE (0.28)

Genre:

Overview (0.93)
Research Report > New Finding (0.93)
Research Report > Experimental Study (0.68)

Industry: Education > Educational Setting > K-12 Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

SimCoachCorpus: A naturalistic dataset with language and trajectories for embodied teaching

Sumner, Emily, Gopinath, Deepak E., Dees, Laporsha, Gomez, Patricio Reyes, Cui, Xiongyi, Silva, Andrew, Costa, Jean, Morgan, Allison, Schrum, Mariah, Chen, Tiffany L., Balachandran, Avinash, Rosman, Guy

Curated datasets are essential for training and evaluating AI approaches, but are often lacking in domains where language and physical action are deeply intertwined. In particular, few datasets capture how people acquire embodied skills through verbal instruction over time. To address this gap, we introduce SimCoachCorpus: a unique dataset of race car simulator driving that allows for the investigation of rich interactive phenomena during guided and unguided motor skill acquisition. In this dataset, 29 humans were asked to drive in a simulator around a race track for approximately ninety minutes. Fifteen participants were given personalized one-on-one instruction from a professional performance driving coach, and 14 participants drove without coaching. \name\ includes embodied features such as vehicle state and inputs, map (track boundaries and raceline), and cone landmarks. These are synchronized with concurrent verbal coaching from a professional coach and additional feedback at the end of each lap. We further provide annotations of coaching categories for each concurrent feedback utterance, ratings on students' compliance with coaching advice, and self-reported cognitive load and emotional state of participants (gathered from surveys during the study). The dataset includes over 20,000 concurrent feedback utterances, over 400 terminal feedback utterances, and over 40 hours of vehicle driving data. Our naturalistic dataset can be used for investigating motor learning dynamics, exploring linguistic phenomena, and training computational models of teaching. We demonstrate applications of this dataset for in-context learning, imitation learning, and topic modeling. The dataset introduced in this work will be released publicly upon publication of the peer-reviewed version of this paper. Researchers interested in early access may register at https://tinyurl.com/SimCoachCorpusForm.

artificial intelligence, machine learning, natural language, (20 more...)

2509.14548

Country: North America > United States (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Sports > Motorsports (1.00)
Education > Educational Setting (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Rationality Check! Benchmarking the Rationality of Large Language Models

Zhou, Zhilun, Wang, Jing Yi, Sukiennik, Nicholas, Gao, Chen, Xu, Fengli, Li, Yong, Evans, James

Large language models (LLMs), a recent advance in deep learning and machine intelligence, have manifested astonishing capacities, now considered among the most promising for artificial general intelligence. With human-like capabilities, LLMs have been used to simulate humans and serve as AI assistants across many applications. As a result, great concern has arisen about whether and under what circumstances LLMs think and behave like real human agents. Rationality is among the most important concepts in assessing human behavior, both in thinking (i.e., theoretical rationality) and in taking action (i.e., practical rationality). In this work, we propose the first benchmark for evaluating the omnibus rationality of LLMs, covering a wide range of domains and LLMs. The benchmark includes an easy-to-use toolkit, extensive experimental results, and analysis that illuminates where LLMs converge and diverge from idealized human rationality. We believe the benchmark can serve as a foundational tool for both developers and users of LLMs.

large language model, machine learning, rationality, (18 more...)

2509.14546

Country: North America > United States (0.46)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (1.00)
Overview (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Education (0.93)
Health & Medicine (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Statistical Methods in Generative AI

Dobriban, Edgar

Artificial Intelligence, and more specifically, Generative AI, is emerging as an important technology. Over the past few years a number of prominent generative AI technologies have been developed and have received widespread attention; ranging from text generation via large language models (ChatGPT, Claude, Llama, Gemini, DeepSeek, Qwen, etc), image generation via diffusion models (Dall-E, Stable Diffusion, etc), to scientific generative AI techniques used for protein generation (e.g., Watson et al. 2023, etc), DNA sequence editing (e.g., Ruffolo et al. 2025, etc), among others. Such methods have been quickly adopted by end users and institutions, both via direct usage, as well as integrated in other tools such as code assistants and web search agents. The scientific community has shown significant interest in using generative AI models, achieving a number of breakthrough results (see e.g., Davies et al. 2021, Hayes et al. 2025, etc), culminating in a 2024 Nobel Prize in Chemistry awarded in part for work with a significant component in protein structure design and generation (The Royal Swedish Academy of Sciences 2024). Yet, the adoption of generative AI (GenAI) methods more generally is hindered by their lack of reliability (see e.g., Farquhar et al. 2024, Strauss et al. 2025, Manduchi et al. 2025, etc).

artificial intelligence, machine learning, natural language, (17 more...)

2509.07054

Country: North America > United States > Pennsylvania (0.28)

Genre:

Research Report (1.00)
Overview (1.00)
Personal > Honors (0.54)

Industry:

Education (0.93)
Health & Medicine > Pharmaceuticals & Biotechnology (0.86)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)