Goto

Collaborating Authors

 Generative AI


Prompt Informed Reinforcement Learning for Visual Coverage Path Planning

arXiv.org Artificial Intelligence

Visual coverage path planning with unmanned aerial vehicles (UAVs) requires agents to strategically coordinate UAV motion and camera control to maximize coverage, minimize redundancy, and maintain battery efficiency. Traditional reinforcement learning (RL) methods rely on environment-specific reward formulations that lack semantic adaptability. This study proposes Prompt-Informed Reinforcement Learning (PIRL), a novel approach that integrates the zero-shot reasoning ability and in-context learning capability of large language models with curiosity-driven RL. PIRL leverages semantic feedback from an LLM, GPT-3.5, to dynamically shape the reward function of the Proximal Policy Optimization (PPO) RL policy guiding the agent in position and camera adjustments for optimal visual coverage. The PIRL agent is trained using OpenAI Gym and evaluated in various environments. Furthermore, the sim-to-real-like ability and zero-shot generalization of the agent are tested by operating the agent in Webots simulator which introduces realistic physical dynamics. Results show that PIRL outperforms multiple learning-based baselines such as PPO with static rewards, PPO with exploratory weight initialization, imitation learning, and an LLM-only controller. Across different environments, PIRL outperforms the best-performing baseline by achieving up to 14% higher visual coverage in OpenAI Gym and 27% higher in Webots, up to 25% higher battery efficiency, and up to 18\% lower redundancy, depending on the environment. The results highlight the effectiveness of LLM-guided reward shaping in complex spatial exploration tasks and suggest a promising direction for integrating natural language priors into RL for robotics.


An Introduction to Flow Matching and Diffusion Models

arXiv.org Artificial Intelligence

Diffusion and flow-based models have become the state of the art for generative AI across a wide range of data modalities, including images, videos, shapes, molecules, music, and more. This tutorial provides a self-contained introduction to diffusion and flow-based generative models from first principles. We systematically develop the necessary mathematical background in ordinary and stochastic differential equations and derive the core algorithms of flow matching and denoising diffusion models. We then provide a step-by-step guide to building image and video generators, including training methods, guidance, and architectural design. This tutorial is ideal for machine learning researchers who want to develop a principled understanding of the theory and practice of generative AI.


Scene-Aware Conversational ADAS with Generative AI for Real-Time Driver Assistance

arXiv.org Artificial Intelligence

While autonomous driving technologies continue to advance, current Advanced Driver Assistance Systems (ADAS) remain limited in their ability to interpret scene context or engage with drivers through natural language. These systems typically rely on predefined logic and lack support for dialogue-based interaction, making them inflexible in dynamic environments or when adapting to driver intent. This paper presents Scene-Aware Conversational ADAS (SC-ADAS), a modular framework that integrates Generative AI components including large language models, vision-to-text interpretation, and structured function calling to enable real-time, interpretable, and adaptive driver assistance. SC-ADAS supports multi-turn dialogue grounded in visual and sensor context, allowing natural language recommendations and driver-confirmed ADAS control. Implemented in the CARLA simulator with cloud-based Generative AI, the system executes confirmed user intents as structured ADAS commands without requiring model fine-tuning. We evaluate SC-ADAS across scene-aware, conversational, and revisited multi-turn interactions, highlighting trade-offs such as increased latency from vision-based context retrieval and token growth from accumulated dialogue history. These results demonstrate the feasibility of combining conversational reasoning, scene perception, and modular ADAS control to support the next generation of intelligent driver assistance.


Do we need equivariant models for molecule generation?

arXiv.org Artificial Intelligence

Deep generative models are increasingly used for molecular discovery, with most recent approaches relying on equivariant graph neural networks (GNNs) under the assumption that explicit equivariance is essential for generating high-quality 3D molecules. However, these models are complex, difficult to train, and scale poorly. We investigate whether non-equivariant convolutional neural networks (CNNs) trained with rotation augmentations can learn equivariance and match the performance of equivariant models. We derive a loss decomposition that separates prediction error from equivariance error, and evaluate how model size, dataset size, and training duration affect performance across denoising, molecule generation, and property prediction. To our knowledge, this is the first study to analyze learned equivariance in generative tasks.


eSapiens: A Platform for Secure and Auditable Retrieval-Augmented Generation

arXiv.org Artificial Intelligence

We present eSapiens, an AI-as-a-Service (AIaaS) platform engineered around a business-oriented trifecta: proprietary data, operational workflows, and any major agnostic Large Language Model (LLM). eSapiens gives businesses full control over their AI assets, keeping everything in-house for AI knowledge retention and data security. eSapiens AI Agents (Sapiens) empower your team by providing valuable insights and automating repetitive tasks, enabling them to focus on high-impact work and drive better business outcomes. The system integrates structured document ingestion, hybrid vector retrieval, and no-code orchestration via LangChain, and supports top LLMs including OpenAI, Claude, Gemini, and DeepSeek. A key component is the THOR Agent, which handles structured SQL-style queries and generates actionable insights over enterprise databases. To evaluate the system, we conduct two experiments. First, a retrieval benchmark on legal corpora reveals that a chunk size of 512 tokens yields the highest retrieval precision (Top-3 accuracy: 91.3%). Second, a generation quality test using TRACe metrics across five LLMs shows that eSapiens delivers more context-consistent outputs with up to a 23% improvement in factual alignment. These results demonstrate the effectiveness of eSapiens in enabling trustworthy, auditable AI workflows for high-stakes domains like legal and finance.


GenAI-based Multi-Agent Reinforcement Learning towards Distributed Agent Intelligence: A Generative-RL Agent Perspective

arXiv.org Artificial Intelligence

Multi-agent reinforcement learning faces fundamental challenges that conventional approaches have failed to overcome: exponentially growing joint action spaces, non-stationary environments where simultaneous learning creates moving targets, and partial observability that constrains coordination. Current methods remain reactive, employing stimulus-response mechanisms that fail when facing novel scenarios. We argue for a transformative paradigm shift from reactive to proactive multi-agent intelligence through generative AI-based reinforcement learning. This position advocates reconceptualizing agents not as isolated policy optimizers, but as sophisticated generative models capable of synthesizing complex multi-agent dynamics and making anticipatory decisions based on predictive understanding of future interactions. Rather than responding to immediate observations, generative-RL agents can model environment evolution, predict other agents' behaviors, generate coordinated action sequences, and engage in strategic reasoning accounting for long-term dynamics. This approach leverages pattern recognition and generation capabilities of generative AI to enable proactive decision-making, seamless coordination through enhanced communication, and dynamic adaptation to evolving scenarios. We envision this paradigm shift will unlock unprecedented possibilities for distributed intelligence, moving beyond individual optimization toward emergent collective behaviors representing genuine collaborative intelligence. The implications extend across autonomous systems, robotics, and human-AI collaboration, promising solutions to coordination challenges intractable under traditional reactive frameworks.


Knowledge Conceptualization Impacts RAG Efficacy

arXiv.org Artificial Intelligence

Explainability and interpretability are cornerstones of frontier and next-generation artificial intelligence (AI) systems. This is especially true in recent systems, such as large language models (LLMs), and more broadly, generative AI. On the other hand, adaptability to new domains, contexts, or scenarios is also an important aspect for a successful system. As such, we are particularly interested in how we can merge these two efforts, that is, investigating the design of transferable and interpretable neurosymbolic AI systems. Specifically, we focus on a class of systems referred to as ''Agentic Retrieval-Augmented Generation'' systems, which actively select, interpret, and query knowledge sources in response to natural language prompts. In this paper, we systematically evaluate how different conceptualizations and representations of knowledge, particularly the structure and complexity, impact an AI agent (in this case, an LLM) in effectively querying a triplestore. We report our results, which show that there are impacts from both approaches, and we discuss their impact and implications.


Multi-Actor Generative Artificial Intelligence as a Game Engine

arXiv.org Artificial Intelligence

Generative AI can be used in multi-actor environments with purposes ranging from social science modeling to interactive narrative and AI evaluation. Supporting this diversity of use cases -- which we classify as Simulationist, Dramatist, and Evaluationist -- demands a flexible scenario definition framework. We argue here that a good approach is to take inspiration from tabletop role-playing games (TTRPGs), where a Game Master (GM) is responsible for the environment and generates all parts of the story not directly determined by the voluntary actions of player characters. We argue that the Entity-Component architectural pattern is useful here. In such a system, the GM is not a hardcoded computer game but is itself a configurable entity, composed of components just like any other actor. By design, the approach allows for a separation between the underlying implementation details handled by an engineer, the creation of reusable components, and their composition and configuration managed by a designer who constructs entities from the components. This separation of concerns is instrumental for achieving rapid iteration, maintaining modularity, and ultimately to ensure scalability. We describe the ongoing evolution of the Concordia library in terms of this philosophy, demonstrating how it allows users to effectively configure scenarios that align with their specific goals.


New AI voice tool trained to copy British regional accents

BBC News

There is concern that languages and dialects are being lost in the digital era. "Among the over seven thousand languages that still exist today, almost half are endangered according to UNESCO; about a third have some online presence; less than 2 percent are supported by Google Translate; and according to OpenAI's own testing, only fifteen, or 0.2 percent are supported by GPT-4 [an OpenAI model] above an 80 percent accuracy," writes Karen Hao in the book Empire of AI. "Language models are homogenising speech," agrees AI expert Henry Ajder, who advises governments and tech firms, including Synthesia. However, the better these products become, the more effective they will also be in the hands of scammers. Synthesia's product will not be free when it is released in the coming weeks, and will have guardrails around hate speech and explicit material. But there are already many free, open-source voice-cloning tools which are easily accessible and less protected.


OpenAI plans to launch its own browser with built-in ChatGPT soon

PCWorld

According to information provided to Reuters, ChatGPT developer OpenAI plans to launch its own web browser in the coming weeks. The new browser will have ChatGPT built in, allowing users to interact directly with the AI chatbot without having to visit the official website. OpenAI is also considering integrating its so-called Operator Agents, which are AI-powered assistants that can autonomously perform various tasks on your behalf, like booking tickets or filling out forms. The upcoming browser will be built on Google's Chromium codebase, the same one that powers Chrome, Microsoft Edge, and others. It's not yet clear whether it will require a paid subscription, like Perplexity's recently launched AI web browser Comet.