Overview
Real-Time Imitation of Human Head Motions, Blinks and Emotions by Nao Robot: A Closed-Loop Approach
Rayati, Keyhan, Feizi, Amirhossein, Beigy, Alireza, Shahverdi, Pourya, Masouleh, Mehdi Tale, Kalhor, Ahmad
--This paper introduces a novel approach for enabling real-time imitation of human head motion by a Nao robot, with a primary focus on elevating human-robot interactions. By using the robust capabilities of the MediaPipe as a computer vision library and the DeepFace as an emotion recognition library, this research endeavors to capture the subtleties of human head motion, including blink actions and emotional expressions, and seamlessly incorporate these indicators into the robot's responses. The result is a comprehensive framework which facilitates precise head imitation within human-robot interactions, utilizing a closed-loop approach that involves gathering real-time feedback from the robot's imitation performance. This feedback loop ensures a high degree of accuracy in modeling head motion, as evidenced by an impressive R2 score of 96.3 for pitch and 98.9 for yaw. Notably, the proposed approach holds promise in improving communication for children with autism, offering them a valuable tool for more effective interaction. In essence, proposed work explores the integration of real-time head imitation and real-time emotion recognition to enhance human-robot interactions, with potential benefits for individuals with unique communication needs. The field of robotics has come a long way in recent years, with significant advancements in the development of humanoid robots.
Taming the Titans: A Survey of Efficient LLM Inference Serving
Zhen, Ranran, Li, Juntao, Ji, Yixin, Yang, Zhenlin, Liu, Tong, Xia, Qingrong, Duan, Xinyu, Wang, Zhefeng, Huai, Baoxing, Zhang, Min
Large Language Models (LLMs) for Generative AI have achieved remarkable progress, evolving into sophisticated and versatile tools widely adopted across various domains and applications. However, the substantial memory overhead caused by their vast number of parameters, combined with the high computational demands of the attention mechanism, poses significant challenges in achieving low latency and high throughput for LLM inference services. Recent advancements, driven by groundbreaking research, have significantly accelerated progress in this field. This paper provides a comprehensive survey of these methods, covering fundamental instance-level approaches, in-depth cluster-level strategies, emerging scenario directions, and other miscellaneous but important areas. At the instance level, we review model placement, request scheduling, decoding length prediction, storage management, and the disaggregation paradigm. At the cluster level, we explore GPU cluster deployment, multi-instance load balancing, and cloud service solutions. For emerging scenarios, we organize the discussion around specific tasks, modules, and auxiliary methods. To ensure a holistic overview, we also highlight several niche yet critical areas. Finally, we outline potential research directions to further advance the field of LLM inference serving.
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
Ferrag, Mohamed Amine, Tihanyi, Norbert, Debbah, Merouane
Large language models and autonomous AI agents have evolved rapidly, resulting in a diverse array of evaluation benchmarks, frameworks, and collaboration protocols. However, the landscape remains fragmented and lacks a unified taxonomy or comprehensive survey. Therefore, we present a side-by-side comparison of benchmarks developed between 2019 and 2025 that evaluate these models and agents across multiple domains. In addition, we propose a taxonomy of approximately 60 benchmarks that cover general and academic knowledge reasoning, mathematical problem-solving, code generation and software engineering, factual grounding and retrieval, domain-specific evaluations, multimodal and embodied tasks, task orchestration, and interactive assessments. Furthermore, we review AI-agent frameworks introduced between 2023 and 2025 that integrate large language models with modular toolkits to enable autonomous decision-making and multi-step reasoning. Moreover, we present real-world applications of autonomous AI agents in materials science, biomedical research, academic ideation, software engineering, synthetic data generation, chemical reasoning, mathematical problem-solving, geographic information systems, multimedia, healthcare, and finance. We then survey key agent-to-agent collaboration protocols, namely the Agent Communication Protocol (ACP), the Model Context Protocol (MCP), and the Agent-to-Agent Protocol (A2A). Finally, we discuss recommendations for future research, focusing on advanced reasoning strategies, failure modes in multi-agent LLM systems, automated scientific discovery, dynamic tool integration via reinforcement learning, integrated search capabilities, and security vulnerabilities in agent protocols.
Generative AI in Education: Student Skills and Lecturer Roles
Krause, Stefanie, Dalvi, Ashish, Zaidi, Syed Khubaib
Generative Artificial Intelligence (GenAI) tools such as ChatGPT are emerging as a revolutionary tool in education that brings both positive aspects and challenges for educators and students, reshaping how learning and teaching are approached. This study aims to identify and evaluate the key competencies students need to effectively engage with GenAI in education and to provide strategies for lecturers to integrate GenAI into teaching practices. The study applied a mixed method approach with a combination of a literature review and a quantitative survey involving 130 students from South Asia and Europe to obtain its findings. The literature review identified 14 essential student skills for GenAI engagement, with AI literacy, critical thinking, and ethical AI practices emerging as the most critical. The student survey revealed gaps in prompt engineering, bias awareness, and AI output management. In our study of lecturer strategies, we identified six key areas, with GenAI Integration and Curriculum Design being the most emphasised. Our findings highlight the importance of incorporating GenAI into education. While literature prioritized ethics and policy development, students favour hands-on, project-based learning and practical AI applications. To foster inclusive and responsible GenAI adoption, institutions should ensure equitable access to GenAI tools, establish clear academic integrity policies, and advocate for global GenAI research initiatives.
GAN-SLAM: Real-Time GAN Aided Floor Plan Creation Through SLAM
Davies, Leon, Li, Baihua, Saada, Mohamad, Sรธlvsten, Simon, Meng, Qinggang
SLAM is a fundamental component of modern autonomous systems, providing robots and their operators with a deeper understanding of their environment. SLAM systems often encounter challenges due to the dynamic nature of robotic motion, leading to inaccuracies in mapping quality, particularly in 2D representations such as Occupancy Grid Maps. These errors can significantly degrade map quality, hindering the effectiveness of specific downstream tasks such as floor plan creation. To address this challenge, we introduce our novel 'GAN-SLAM', a new SLAM approach that leverages Generative Adversarial Networks to clean and complete occupancy grids during the SLAM process, reducing the impact of noise and inaccuracies introduced on the output map. We adapt and integrate accurate pose estimation techniques typically used for 3D SLAM into a 2D form. This enables the quality improvement 3D LiDAR-odometry has seen in recent years to be effective for 2D representations. Our results demonstrate substantial improvements in map fidelity and quality, with minimal noise and errors, affirming the effectiveness of GAN-SLAM for real-world mapping applications within large-scale complex environments. We validate our approach on real-world data operating in real-time, and on famous examples of 2D maps. The improved quality of the output map enables new downstream tasks, such as floor plan drafting, further enhancing the capabilities of autonomous systems. Our novel approach to SLAM offers a significant step forward in the field, improving the usability for SLAM in mapping-based tasks, and offers insight into the usage of GANs for OGM error correction.
Conflicts in Texts: Data, Implications and Challenges
As NLP models become increasingly integrated into real-world applications, it becomes clear that there is a need to address the fact that models often rely on and generate conflicting information. Conflicts could reflect the complexity of situations, changes that need to be explained and dealt with, difficulties in data annotation, and mistakes in generated outputs. In all cases, disregarding the conflicts in data could result in undesired behaviors of models and undermine NLP models' reliability and trustworthiness. This survey categorizes these conflicts into three key areas: (1) natural texts on the web, where factual inconsistencies, subjective biases, and multiple perspectives introduce contradictions; (2) human-annotated data, where annotator disagreements, mistakes, and societal biases impact model training; and (3) model interactions, where hallucinations and knowledge conflicts emerge during deployment. While prior work has addressed some of these conflicts in isolation, we unify them under the broader concept of conflicting information, analyze their implications, and discuss mitigation strategies. We highlight key challenges and future directions for developing conflict-aware NLP systems that can reason over and reconcile conflicting information more effectively.
Generative Adversarial Network based Voice Conversion: Techniques, Challenges, and Recent Advancements
Dhar, Sandipan, Jana, Nanda Dulal, Das, Swagatam
Voice conversion (VC) stands as a crucial research area in speech synthesis, enabling the transformation of a speaker's vocal characteristics to resemble another while preserving the linguistic content. This technology has broad applications, including automated movie dubbing, speech-to-singing conversion, and assistive devices for pathological speech rehabilitation. With the increasing demand for high-quality and natural-sounding synthetic voices, researchers have developed a wide range of VC techniques. Among these, generative adversarial network (GAN)-based approaches have drawn considerable attention for their powerful feature-mapping capabilities and potential to produce highly realistic speech. Despite notable advancements, challenges such as ensuring training stability, maintaining linguistic consistency, and achieving perceptual naturalness continue to hinder progress in GAN-based VC systems. This systematic review presents a comprehensive analysis of the voice conversion landscape, highlighting key techniques, key challenges, and the transformative impact of GANs in the field. The survey categorizes existing methods, examines technical obstacles, and critically evaluates recent developments in GAN-based VC. By consolidating and synthesizing research findings scattered across the literature, this review provides a structured understanding of the strengths and limitations of different approaches. The significance of this survey lies in its ability to guide future research by identifying existing gaps, proposing potential directions, and offering insights for building more robust and efficient VC systems. Overall, this work serves as an essential resource for researchers, developers, and practitioners aiming to advance the state-of-the-art (SOTA) in voice conversion technology.
Generative AI for Character Animation: A Comprehensive Survey of Techniques, Applications, and Future Directions
Abootorabi, Mohammad Mahdi, Ghahroodi, Omid, Zahraei, Pardis Sadat, Behzadasl, Hossein, Mirrokni, Alireza, Salimipanah, Mobina, Rasouli, Arash, Behzadipour, Bahar, Azarnoush, Sara, Maleki, Benyamin, Sadraiye, Erfan, Feriz, Kiarash Kiani, Nahad, Mahdi Teymouri, Moghadasi, Ali, Abianeh, Abolfazl Eshagh, Nazar, Nizi, Rabiee, Hamid R., Baghshah, Mahdieh Soleymani, Ahmadi, Meisam, Asgari, Ehsaneddin
Generative AI is reshaping art, gaming, and most notably animation. Recent breakthroughs in foundation and diffusion models have reduced the time and cost of producing animated content. Characters are central animation components, involving motion, emotions, gestures, and facial expressions. The pace and breadth of advances in recent months make it difficult to maintain a coherent view of the field, motivating the need for an integrative review. Unlike earlier overviews that treat avatars, gestures, or facial animation in isolation, this survey offers a single, comprehensive perspective on all the main generative AI applications for character animation. We begin by examining the state-of-the-art in facial animation, expression rendering, image synthesis, avatar creation, gesture modeling, motion synthesis, object generation, and texture synthesis. We highlight leading research, practical deployments, commonly used datasets, and emerging trends for each area. To support newcomers, we also provide a comprehensive background section that introduces foundational models and evaluation metrics, equipping readers with the knowledge needed to enter the field. We discuss open challenges and map future research directions, providing a roadmap to advance AI-driven character-animation technologies. This survey is intended as a resource for researchers and developers entering the field of generative AI animation or adjacent fields. Resources are available at: https://github.com/llm-lab-org/Generative-AI-for-Character-Animation-Survey.
Atlantes: A system of GPS transformers for global-scale real-time maritime intelligence
Herzog, Henry, Hansen, Joshua, Zhang, Yawen, Beukema, Patrick
Unsustainable exploitation of the oceans exacerbated by global warming is threatening coastal communities worldwide. Accurate and timely monitoring of maritime activity is an essential step to effective governance and to inform future policy. In support of this complex global-scale effort, we built Atlantes, a deep learning based system that provides the first-ever real-time view of vessel behavior at global scale. Atlantes leverages a series of bespoke transformers to distill a high volume, continuous stream of GPS messages emitted by hundreds of thousands of vessels into easily quantifiable behaviors. The combination of low latency and high performance enables operationally relevant decision-making and successful interventions on the high seas where illegal and exploitative activity is too common. Atlantes is already in use by hundreds of organizations worldwide. Here we provide an overview of the model and infrastructure that enables this system to function efficiently and cost-effectively at global-scale and in real-time.
Advanced Longitudinal Control and Collision Avoidance for High-Risk Edge Cases in Autonomous Driving
Chen, Dianwei, Gong, Yaobang, Yang, Xianfeng
Advanced Longitudinal Control and Collision Avoidance for High-Risk Edge Cases in Autonomous Driving Dianwei Chen a, Yaobang Gong a and Xianfeng Terry Yang a, a Department of Civil and Environmental Engineering, University of Maryland, College Park, Maryland, 20740, United StatesA R T I C L E I N F OKeywords: Advanced Driver-Assistance Systems Collision avoidance Reinforcement learning Edge Cases Trajectory calibration model Automatic Emergency Braking A B S T R A C T Advanced Driver-Assistance Systems (ADAS) and Advanced Driving Systems (ADS) are key to improving road safety, yet most existing implementations focus primarily on the vehicle ahead, neglecting the behavior of following vehicles. This shortfall often leads to chain-reaction collisions in high-speed, densely spaced traffic--particularly when a middle vehicle suddenly brakes and trailing vehicles cannot respond in time. To address this critical gap, we propose a novel longitudinal control and collision avoidance algorithm that integrates adaptive cruising with emergency braking. Leveraging deep reinforcement learning, our method simultaneously accounts for both leading and following vehicles. Through a data preprocessing framework that calibrates real-world sensor data, we enhance the robustness and reliability of the training process, ensuring the learned policy can handle diverse driving conditions. In simulated high-risk scenarios (e.g., emergency braking in dense traffic), the algorithm effectively prevents potential pile-up collisions, even in situations involving heavy-duty vehicles. Furthermore, in typical highway scenarios where three vehicles decelerate, the proposed DRL approach achieves a 99% success rate--far surpassing the standard Federal Highway Administration speed concepts guide, which reaches only 36.77% success under the same conditions.1. Introduction Advanced Driver-Assistance Systems (ADASs) and Automated Driving Systems (ADSs) are pivotal technologies in modern vehicles, sharing the overarching goal of improving road safety and paving the way toward fully autonomous driving. ADASs primarily function as semi-automated features that monitor the vehicle's environment, intervening when drivers do not respond adequately (Galvani, 2019; Kukkala et al., 2018). Collectively, these advancements have significantly enhanced road safety and occupant comfort, with many vehicles now including one or more ADAS features as standard equipment. In parallel, efforts in autonomous driving--encompassing levels of automation from partial (Level 3) to fully autonomous (Level 4 or 5)--have led to the development of ADSs (Leiman, 2021). These systems aim to replace or minimize human input in vehicle operation, leveraging advanced sensing, computing, and control technologies to handle dynamic road conditions (Okuda et al., 2014).