Goto

Collaborating Authors

 Overview


VDC-Agent: When Video Detailed Captioners Evolve Themselves via Agentic Self-Reflection

arXiv.org Artificial Intelligence

W e present VDC-Agent, a self-evolving framework for Video Detailed Captioning that requires neither human annotations nor larger teacher models. The agent forms a closed loop of caption generation, principle-guided scoring (score and textual suggestions), and prompt refinement. When caption quality regresses, a self-reflection path leverages the previous chain-of-thought to amend the update. Running this process on unlabeled videos produces trajectories of (caption, score) pairs. W e convert the trajectories into preference tuples and filter out samples with JSON parsing errors, resulting in VDC-Agent-19K, which contains 18,886 automatically constructed pairs.


LLM-Driven Stationarity-Aware Expert Demonstrations for Multi-Agent Reinforcement Learning in Mobile Systems

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning (MARL) has been increasingly adopted in many real-world applications. While MARL enables decentralized deployment on resource-constrained edge devices, it suffers from severe non-stationarity due to the synchronous updates of agent policies. This non stationarity results in unstable training and poor policy con vergence, especially as the number of agents increases. In this paper, we propose RELED, a scalable MARL framework that integrates large language model (LLM)-driven expert demonstrations with autonomous agent exploration. RELED incorporates a Stationarity-Aware Expert Demonstration module, which leverages theoretical non-stationarity bounds to enhance the quality of LLM-generated expert trajectories, thus providing high reward and training-stable samples for each agent. Moreover, a Hybrid Expert-Agent Policy Optimization module adaptively balances each agent's learning from both expert-generated and agent-generated trajectories, accelerating policy convergence and improving generalization. Extensive experiments with real city networks based on OpenStreetMap demonstrate that RELED achieves superior performance compared to state-of-the-art MARL methods.


Unboxing the Black Box: Mechanistic Interpretability for Algorithmic Understanding of Neural Networks

arXiv.org Artificial Intelligence

Artificial intelligence (AI) is increasingly assisting us in a wide range of tasks, from everyday applications like recommendation systems to high-risk domains such as bio-metric recognition, autonomous vehicles, and medical diagnosis [1]. In particular, the rise of transformer-based models, such as those used in natural language processing (NLP), has significantly accelerated AI's adoption and visibility in society, enabling breakthroughs in fields like text generation, translation, and image understanding [2]. The size, complexity, and opacity of deep learning models are growing exponentially, further outpacing the ability of researchers to understand the black box. As deep neural networks are increasingly deployed in real-world applications with more advanced use cases, the impact of AI continues to grow. This growing influence, coupled with the often opaque, black-box nature of most AI systems, has led to a heightened demand for AI models that are both faithful and explainable. The validation of AI's decisions is especially critical in high-risks areas, such as law or medicine [3, 4]. As a result, Explainable AI (XAI) emerged as a direct response to companies' and researchers' demands to interpret, explain and validate neural networks to make AI systems trustworthy. XAI encompasses all methods, approaches and efforts to uncover the reasoning and behavior of artificial intelligence systems [1]. Thus, it is important to establish an understanding of common terms used in the XAI literature, despite the lack of universally accepted definitions.


AI Consciousness and Existential Risk

arXiv.org Artificial Intelligence

In AI, the existential risk denotes the hypothetical threat posed by an artificial system that would possess both the capability and the objective, either directly or indirectly, to eradicate humanity. This issue is gaining prominence in scientific debate due to recent technical advancements and increased media coverage. In parallel, AI progress has sparked speculation and studies about the potential emergence of artificial consciousness. The two questions, AI consciousness and existential risk, are sometimes conflated, as if the former entailed the latter. Here, I explain that this view stems from a common confusion between consciousness and intelligence. Yet these two properties are empirically and theoretically distinct. Arguably, while intelligence is a direct predictor of an AI system's existential threat, consciousness is not. There are, however, certain incidental scenarios in which consciousness could influence existential risk, in either direction. Consciousness could be viewed as a means towards AI alignment, thereby lowering existential risk; or, it could be a precondition for reaching certain capabilities or levels of intelligence, and thus positively related to existential risk. Recognizing these distinctions can help AI safety researchers and public policymakers focus on the most pressing issues.


Edge-Based Predictive Data Reduction for Smart Agriculture: A Lightweight Approach to Efficient IoT Communication

arXiv.org Artificial Intelligence

The rapid growth of IoT devices has led to an enormous amount of sensor data that requires transmission to cloud servers for processing, resulting in excessive network congestion, increased latency and high energy consumption. This is particularly problematic in resource-constrained and remote environments where bandwidth is limited, and battery-dependent devices further emphasize the problem. Moreover, in domains such as agriculture, consecutive sensor readings often have minimal variation, making continuous data transmission inefficient and unnecessarily resource intensive. To overcome these challenges, we propose an analytical prediction algorithm designed for edge computing environments and validated through simulation. The proposed solution utilizes a predictive filter at the network edge that forecasts the next sensor data point and triggers data transmission only when the deviation from the predicted value exceeds a predefined tolerance. A complementary cloud-based model ensures data integrity and overall system consistency. This dual-model strategy effectively reduces communication overhead and demonstrates potential for improving energy efficiency by minimizing redundant transmissions. In addition to reducing communication load, our approach leverages both in situ and satellite observations from the same locations to enhance model robustness. It also supports cross-site generalization, enabling models trained in one region to be effectively deployed elsewhere without retraining. This makes our solution highly scalable, energy-aware, and well-suited for optimizing sensor data transmission in remote and bandwidth-constrained IoT environments.


An Analysis of Constraint-Based Multi-Agent Pathfinding Algorithms

arXiv.org Artificial Intelligence

This study informs the design of future multi-agent pathfinding (MAPF) and multi-robot motion planning (MRMP) algorithms by guiding choices based on constraint classification for constraint-based search algorithms. We categorize constraints as conservative or aggressive and provide insights into their search behavior, focusing specifically on vanilla Conflict-Based Search (CBS) and Conflict-Based Search with Priorities (CBSw/P). Under a hybrid grid-roadmap representation with varying resolution, we observe that aggressive (priority constraint) formulations tend to solve more instances as agent count or resolution increases, whereas conservative (motion constraint) formulations yield stronger solution quality when both succeed. Findings are synthesized in a decision flowchart, aiding users in selecting suitable constraints. Recommendations extend to Multi-Robot Motion Planning (MRMP), emphasizing the importance of considering topological features alongside problem, solution, and representation features. A comprehensive exploration of the study, including raw data and map performance, is available in our public GitHub Repository: https://GitHub.com/hannahjmlee/constraint-mapf-analysis


From Simulations to Surveys: Domain Adaptation for Galaxy Observations

arXiv.org Artificial Intelligence

Large photometric surveys will image billions of galaxies, but we currently lack quick, reliable automated ways to infer their physical properties like morphology, stellar mass, and star formation rates. Simulations provide galaxy images with ground-truth physical labels, but domain shifts in PSF, noise, backgrounds, selection, and label priors degrade transfer to real surveys. We present a preliminary domain adaptation pipeline that trains on simulated TNG50 galaxies and evaluates on real SDSS galaxies with morphology labels (elliptical/spiral/irregular). We train three backbones (CNN, $E(2)$-steerable CNN, ResNet-18) with focal loss and effective-number class weighting, and a feature-level domain loss $L_D$ built from GeomLoss (entropic Sinkhorn OT, energy distance, Gaussian MMD, and related metrics). We show that a combination of these losses with an OT-based "top_$k$ soft matching" loss that focuses $L_D$ on the worst-matched source-target pairs can further enhance domain alignment. With Euclidean distance, scheduled alignment weights, and top-$k$ matching, target accuracy (macro F1) rises from $\sim$46% ($\sim$30%) at no adaptation to $\sim$87% ($\sim$62.6%), with a domain AUC near 0.5, indicating strong latent-space mixing.


Foundations of Artificial Intelligence Frameworks: Notion and Limits of AGI

arXiv.org Artificial Intelligence

Within the limited scope of this paper, we argue that artificial general intelligence cannot emerge from current neural network paradigms regardless of scale, nor is such an approach healthy for the field at present. Drawing on various notions, discussions, present-day developments and observations, current debates and critiques, experiments, and so on in between philosophy, including the Chinese Room Argument and Gรถdelian argument, neuroscientific ideas, computer science, the theoretical consideration of artificial intelligence, and learning theory, we address conceptually that neural networks are architecturally insufficient for genuine understanding. They operate as static function approximators of a limited encoding framework - a 'sophisticated sponge' exhibiting complex behaviours without structural richness that constitute intelligence. We critique the theoretical foundations the field relies on and created of recent times; for example, an interesting heuristic as neural scaling law (as an example, arXiv:2001.08361 ) made prominent in a wrong way of interpretation, The Universal Approximation Theorem addresses the wrong level of abstraction and, in parts, partially, the question of current architectures lacking dynamic restructuring capabilities. We propose a framework distinguishing existential facilities (computational substrate) from architectural organization (interpretive structures), and outline principles for what genuine machine intelligence would require, and furthermore, a conceptual method of structuralizing the richer framework on which the principle of neural network system takes hold.


Real-Time Personalized Content Adaptation through Matrix Factorization and Context-Aware Federated Learning

arXiv.org Artificial Intelligence

Our study presents a multifaceted approach to enhancing user interaction and content relevance in social media platforms through a federated learning framework. We introduce personalized LLM Federated Learning and Context-based Social Media models. In our framework, multiple client entities receive a foundational GPT model, which is fine-tuned using locally collected social media data while ensuring data privacy through federated aggregation. Key modules focus on categorizing user-generated content, computing user persona scores, and identifying relevant posts from friends networks. By integrating a sophisticated social engagement quantification method with matrix factorization techniques, our system delivers real-time personalized content suggestions tailored to individual preferences. Furthermore, an adaptive feedback loop, alongside a robust readability scoring algorithm, significantly enhances the quality and relevance of the content presented to users. This comprehensive solution not only addresses the challenges of content filtering and recommendation but also fosters a more engaging social media experience while safeguarding user privacy, setting a new standard for personalized interactions in digital platforms.


The Workflow as Medium: A Framework for Navigating Human-AI Co-Creation

arXiv.org Artificial Intelligence

This paper introduces the Creative Intelligence Loop (CIL), a novel socio-technical framework for responsible human-AI co-creation. Rooted in the 'Workflow as Medium' paradigm, the CIL proposes a disciplined structure for dynamic human-AI collaboration, guiding the strategic integration of diverse AI teammates who function as collaborators while the human remains the final arbiter for ethical alignment and creative integrity. The CIL was empirically demonstrated through the practice-led creation of two graphic novellas, investigating how AI could serve as an effective creative colleague within a subjective medium lacking objective metrics. The process required navigating multifaceted challenges including AI's 'jagged frontier' of capabilities, sycophancy, and attention-scarce feedback environments. This prompted iterative refinement of teaming practices, yielding emergent strategies: a multi-faceted critique system integrating adversarial AI roles to counter sycophancy, and prioritizing 'feedback-ready' concrete artifacts to elicit essential human critique. The resulting graphic novellas analyze distinct socio-technical governance failures: 'The Steward' examines benevolent AI paternalism in smart cities, illustrating how algorithmic hubris can erode freedom; 'Fork the Vote' probes democratic legitimacy by comparing centralized AI opacity with emergent collusion in federated networks. This work contributes a self-improving framework for responsible human-AI co-creation and two graphic novellas designed to foster AI literacy and dialogue through accessible narrative analysis of AI's societal implications.