Goto

Collaborating Authors

 Overview


Spotting tell-tale visual artifacts in face swapping videos: strengths and pitfalls of CNN detectors

arXiv.org Artificial Intelligence

This work has been submitted to the IEEE for possible publication. Abstract Face swapping manipulations in video streams represents an increasing threat in remote video communications, due to advances in automated and real-time tools. Recent literature proposes to characterize and exploit visual artifacts introduced in video frames by swapping algorithms when dealing with challenging physical scenes, such as face occlusions. This paper investigates the effectiveness of this approach by benchmarking CNN-based data-driven models on two data corpora (including a newly collected one) and analyzing generalization capabilities with respect to different acquisition sources and swapping algorithms. The results confirm excellent performance of general-purpose CNN architectures when operating within the same data source, but a significant difficulty in robustly characterizing occlusion-based visual cues across datasets. This highlights the need for specialized detection strategies to deal with such artifacts. The synthesis and manipulation of facial images and videos have achieved increasingly hyper-realistic results in recent years [4], leading to numerous research efforts for the automated identification of non-genuine visual data [25] [27].


Data-Driven Policy Mapping for Safe RL-based Energy Management Systems

arXiv.org Artificial Intelligence

Increasing global energy demand and renewable integration complexity have placed buildings at the center of sustainable energy management. We present a three-step reinforcement learning(RL)-based Building Energy Management System (BEMS) that combines clustering, forecasting, and constrained policy learning to address scalability, adaptability, and safety challenges. First, we cluster non-shiftable load profiles to identify common consumption patterns, enabling policy generalization and transfer without retraining for each new building. Next, we integrate an LSTM based forecasting module to anticipate future states, improving the RL agents' responsiveness to dynamic conditions. Lastly, domain-informed action masking ensures safe exploration and operation, preventing harmful decisions. Evaluated on real-world data, our approach reduces operating costs by up to 15% for certain building types, maintains stable environmental performance, and quickly classifies and optimizes new buildings with limited data. It also adapts to stochastic tariff changes without retraining. Overall, this framework delivers scalable, robust, and cost-effective building energy management.


Artificial Intelligence for Atmospheric Sciences: A Research Roadmap

arXiv.org Artificial Intelligence

Atmospheric sciences are crucial for understanding environmental phenomena ranging from air quality to extreme weather events, and climate change. Recent breakthroughs in sensing, communication, computing, and Artificial Intelligence (AI) have significantly advanced atmospheric sciences, enabling the generation of vast amounts of data through long-term Earth observations and providing powerful tools for analyzing atmospheric phenomena and predicting natural disasters. This paper contributes a critical interdisciplinary overview that bridges the fields of atmospheric science and computer science, highlighting the transformative potential of AI in atmospheric research. We identify key challenges associated with integrating AI into atmospheric research, including issues related to big data and infrastructure, and provide a detailed research roadmap that addresses both current and emerging challenges.


Advancing atomic electron tomography with neural networks

arXiv.org Artificial Intelligence

Accurate determination of three-dimensional (3D) atomic structures is crucial for understanding and controlling the properties of nanomaterials. Atomic electron tomography (AET) offers non-destructive atomic imaging with picometer-level precision, enabling the resolution of defects, interfaces, and strain fields in 3D, as well as the observation of dynamic structural evolution. However, reconstruction artifacts arising from geometric limitations and electron dose constraints can hinder reliable atomic structure determination. Recent progress has integrated deep learning, especially convolutional neural networks, into AET workflows to improve reconstruction fidelity. This review highlights recent advances in neural network-assisted AET, emphasizing its role in overcoming persistent challenges in 3D atomic imaging. By significantly enhancing the accuracy of both surface and bulk structural characterization, these methods are advancing the frontiers of nanoscience and enabling new opportunities in materials research and technology.


The Memory Paradox: Why Our Brains Need Knowledge in an Age of AI

arXiv.org Artificial Intelligence

In the age of generative AI and ubiquitous digital tools, human cognition faces a structural paradox: as external aids become more capable, internal memory systems risk atrophy. Drawing on neuroscience and cognitive psychology, this paper examines how heavy reliance on AI systems and discovery-based pedagogies may impair the consolidation of declarative and procedural memory -- systems essential for expertise, critical thinking, and long-term retention. We review how tools like ChatGPT and calculators can short-circuit the retrieval, error correction, and schema-building processes necessary for robust neural encoding. Notably, we highlight striking parallels between deep learning phenomena such as "grokking" and the neuroscience of overlearning and intuition. Empirical studies are discussed showing how premature reliance on AI during learning inhibits proceduralization and intuitive mastery. We argue that effective human-AI interaction depends on strong internal models -- biological "schemata" and neural manifolds -- that enable users to evaluate, refine, and guide AI output. The paper concludes with policy implications for education and workforce training in the age of large language models.


Cyberbullying Detection in Hinglish Text Using MURIL and Explainable AI

arXiv.org Artificial Intelligence

The growth of digital communication platforms has led to increased cyberbullying incidents worldwide, creating a need for automated detection systems to protect users. The rise of code-mixed Hindi-English (Hinglish) communication on digital platforms poses challenges for existing cyberbullying detection systems, which were designed primarily for monolingual text. This paper presents a framework for cyberbullying detection in Hinglish text using the Multilingual Representations for Indian Languages (MURIL) architecture to address limitations in current approaches. Evaluation across six benchmark datasets -- Bohra \textit{et al.}, BullyExplain, BullySentemo, Kumar \textit{et al.}, HASOC 2021, and Mendeley Indo-HateSpeech -- shows that the MURIL-based approach outperforms existing multilingual models including RoBERTa and IndicBERT, with improvements of 1.36 to 13.07 percentage points and accuracies of 86.97\% on Bohra, 84.62\% on BullyExplain, 86.03\% on BullySentemo, 75.41\% on Kumar datasets, 83.92\% on HASOC 2021, and 94.63\% on Mendeley dataset. The framework includes explainability features through attribution analysis and cross-linguistic pattern recognition. Ablation studies show that selective layer freezing, appropriate classification head design, and specialized preprocessing for code-mixed content improve detection performance, while failure analysis identifies challenges including context-dependent interpretation, cultural understanding, and cross-linguistic sarcasm detection, providing directions for future research in multilingual cyberbullying detection.


Human-Centered Shared Autonomy for Motor Planning, Learning, and Control Applications

arXiv.org Artificial Intelligence

With recent advancements in AI and computational tools, intelligent paradigms have emerged to enhance fields like shared autonomy and human-machine teaming in healthcare. Advanced AI algorithms (e.g., reinforcement learning) can autonomously make decisions to achieve planning and motion goals. However, in healthcare, where human intent is crucial, fully independent machine decisions may not be ideal. This chapter presents a comprehensive review of human-centered shared autonomy AI frameworks, focusing on upper limb biosignal-based machine interfaces and associated motor control systems, including computer cursors, robotic arms, and planar platforms. We examine motor planning, learning (rehabilitation), and control, covering conceptual foundations of human-machine teaming in reach-and-grasp tasks and analyzing both theoretical and practical implementations. Each section explores how human and machine inputs can be blended for shared autonomy in healthcare applications. Topics include human factors, biosignal processing for intent detection, shared autonomy in brain-computer interfaces (BCI), rehabilitation, assistive robotics, and Large Language Models (LLMs) as the next frontier. We propose adaptive shared autonomy AI as a high-performance paradigm for collaborative human-AI systems, identify key implementation challenges, and outline future directions, particularly regarding AI reasoning agents. This analysis aims to bridge neuroscientific insights with robotics to create more intuitive, effective, and ethical human-machine teaming frameworks.


Bridging Brain with Foundation Models through Self-Supervised Learning

arXiv.org Artificial Intelligence

Foundation models (FMs), powered by self-supervised learning (SSL), have redefined the capabilities of artificial intelligence, demonstrating exceptional performance in domains like natural language processing and computer vision. These advances present a transformative opportunity for brain signal analysis. Unlike traditional supervised learning, which is limited by the scarcity of labeled neural data, SSL offers a promising solution by enabling models to learn meaningful representations from unlabeled data. This is particularly valuable in addressing the unique challenges of brain signals, including high noise levels, inter-subject variability, and low signal-to-noise ratios. This survey systematically reviews the emerging field of bridging brain signals with foundation models through the innovative application of SSL. It explores key SSL techniques, the development of brain-specific foundation models, their adaptation to downstream tasks, and the integration of brain signals with other modalities in multimodal SSL frameworks. The review also covers commonly used evaluation metrics and benchmark datasets that support comparative analysis. Finally, it highlights key challenges and outlines future research directions. This work aims to provide researchers with a structured understanding of this rapidly evolving field and a roadmap for developing generalizable brain foundation models powered by self-supervision.


Advancing Autonomous Racing: A Comprehensive Survey of the RoboRacer (F1TENTH) Platform

arXiv.org Artificial Intelligence

--The RoboRacer (F1TENTH) platform has emerged as a leading testbed for advancing autonomous driving research, offering a scalable, cost-effective, and community-driven environment for experimentation. This paper presents a comprehensive survey of the platform, analyzing its modular hardware and software architecture, diverse research applications, and role in autonomous systems education. We examine critical aspects such as bridging the simulation-to-reality (Sim2Real) gap, integration with simulation environments, and the availability of standardized datasets and benchmarks. Furthermore, the survey highlights advancements in perception, planning, and control algorithms, as well as insights from global competitions and collaborative research efforts. The findings underscore the platform's significance in driving forward developments in autonomous racing and robotics.


Finance Language Model Evaluation (FLaME)

arXiv.org Artificial Intelligence

Language Models (LMs) have demonstrated impressive capabilities with core Natural Language Processing (NLP) tasks. The effectiveness of LMs for highly specialized knowledge-intensive tasks in finance remains difficult to assess due to major gaps in the methodologies of existing evaluation frameworks, which have caused an erroneous belief in a far lower bound of LMs' performance on common Finance NLP (FinNLP) tasks. To demonstrate the potential of LMs for these FinNLP tasks, we present the first holistic benchmarking suite for Financial Language Model Evaluation (FLaME). We are the first research paper to comprehensively study LMs against 'reasoning-reinforced' LMs, with an empirical study of 23 foundation LMs over 20 core NLP tasks in finance. We open-source our framework software along with all data and results.