Energy
Accelerating Visual-Policy Learning through Parallel Differentiable Simulation
You, Haoxiang, Liu, Yilang, Abraham, Ian
In this work, we propose a computationally efficient algorithm for visual policy learning that leverages differentiable simulation and first-order analytical policy gradients. Our approach decouple the rendering process from the computation graph, enabling seamless integration with existing differentiable simulation ecosystems without the need for specialized differentiable rendering software. This decoupling not only reduces computational and memory overhead but also effectively attenuates the policy gradient norm, leading to more stable and smoother optimization. We evaluate our method on standard visual control benchmarks using modern GPU-accelerated simulation. Experiments show that our approach significantly reduces wall-clock training time and consistently outperforms all baseline methods in terms of final returns. Notably, on complex tasks such as humanoid locomotion, our method achieves a $4\times$ improvement in final return, and successfully learns a humanoid running policy within 4 hours on a single GPU.
Towards High Resolution Probabilistic Coastal Inundation Forecasting from Sparse Observations
Islam, Kazi Ashik, Mehrab, Zakaria, Halappanavar, Mahantesh, Mortveit, Henning, Katragadda, Sridhar, Loftis, Jon Derek, Hoops, Stefan, Marathe, Madhav
Coastal flooding poses increasing threats to communities worldwide, necessitating accurate and hyper-local inundation forecasting for effective emergency response. However, real-world deployment of forecasting systems is often constrained by sparse sensor networks, where only a limited subset of locations may have sensors due to budget constraints. To approach this challenge, we present DIFF -SPARSE, a masked conditional diffusion model designed for probabilistic coastal inundation forecasting from sparse sensor observations. DIFF -SPARSE primarily utilizes the inundation history of a location and its neighboring locations from a context time window as spatiotemporal context. The fundamental challenge of spatiotemporal prediction based on sparse observations in the context window is addressed by introducing a novel masking strategy during training. Digital elevation data and temporal co-variates are utilized as additional spatial and temporal contexts, respectively. A convolutional neural network and a conditional UNet architecture with cross-attention mechanism are employed to capture the spatiotemporal dynamics in the data. We trained and tested DIFF -SPARSE on coastal inundation data from the Eastern Shore of Virginia and systematically assessed the performance of DIFF -SPARSE across different sparsity levels 0%, 50%, 95% missing observations. Our experiment results show that DIFF -SPARSE achieves upto 62% improvement in terms of two forecasting performance metrics compared to existing methods, at 95% sparsity level. Moreover, our ablation studies reveal that digital elevation data becomes more useful at high sparsity levels compared to temporal co-variates.
Revealing the Hidden Third Dimension of Point Defects in Two-Dimensional MXenes
Guinan, Grace, Smeaton, Michelle A., Wyatt, Brian C., Goldy, Steven, Egan, Hilary, Glaws, Andrew, Tucker, Garritt J., Anasori, Babak, Spurgeon, Steven R.
Point defects govern many important functional properties of two - dimensional ( 2D) materials. However, resolving the three - dimensional (3D) arrangement of these defects in multi - layer 2D materials remains a fundamental challenge, hindering rational defect engineering . Our approach reconstructs the 3D coordinates of vacancies across hundreds of thousands of lattice sites, generating robust statistical insight into their dist ribution that can be correlated with specinullic synthesis pathways. This large - scale data enables us to classify a hierarchy of defect structures -- from isolated vacancies to nanopores -- revealing their preferred formation and interaction mechanisms, as corroborated by molecular dynamics simulations . This work provides a generalizable framework for understanding and ultimately controlling point defects across large volumes, paving the way for the rational design of defect - engineered functional 2D materials. Keywords: 2D materials, point defects, autonomous materials science, electron microscopy, machine learning 2 Two - dimensional (2D) materials have become a major nullield of modern research in materials science after the discovery of graphene in 2004 . The challenge of characterizing point defects is signinullicantly amplinullied in few - layered 2D materials. For instance, MXenes -- a class of 2D transition metal carbides, carbonitrides, and nitrides -- consist of nanosheets containing two to nullive layers of metal ato ms, which complicates defect analysis compared to single - layer materials .
Learning Omnidirectional Locomotion for a Salamander-Like Quadruped Robot
Liu, Zhiang, Liu, Yang, Fang, Yongchun, Guo, Xian
Salamander-like quadruped robots are designed inspired by the skeletal structure of their biological counterparts. However, existing controllers cannot fully exploit these morphological features and largely rely on predefined gait patterns or joint trajectories, which prevents the generation of diverse and flexible locomotion and limits their applicability in real-world scenarios. In this paper, we propose a learning framework that enables the robot to acquire a diverse repertoire of omnidirectional gaits without reference motions. Each body part is controlled by a phase variable capable of forward and backward evolution, with a phase coverage reward to promote the exploration of the leg phase space. Additionally, morphological symmetry of the robot is incorporated via data augmentation, improving sample efficiency and enforcing both motion-level and task-level symmetry in learned behaviors. Extensive experiments show that the robot successfully acquires 22 omnidirectional gaits exhibiting both dynamic and symmetric movements, demonstrating the effectiveness of the proposed learning framework.
Multi-Agent GraphRAG: A Text-to-Cypher Framework for Labeled Property Graphs
Gusarov, Anton, Volkova, Anastasia, Khrulkov, Valentin, Kuznetsov, Andrey, Maslov, Evgenii, Oseledets, Ivan
While Retrieval-Augmented Generation (RAG) methods commonly draw information from unstructured documents, the emerging paradigm of GraphRAG aims to leverage structured data such as knowledge graphs. Most existing GraphRAG efforts focus on Resource Description Framework (RDF) knowledge graphs, relying on triple representations and SP ARQL queries. However, the potential of Cypher and Labeled Property Graph (LPG) databases to serve as scalable and effective reasoning engines within GraphRAG pipelines remains underexplored in current research literature. To fill this gap, we propose Multi-Agent GraphRAG, a modular LLM agentic system for text-to-Cypher query generation serving as a natural language interface to LPG-based graph data. Our proof-of-concept system features an LLMbased workflow for automated Cypher queries generation and execution, using Memgraph as the graph database backend. Iterative content-aware correction and normalization, reinforced by an aggregated feedback loop, ensures both semantic and syntactic refinement of generated queries. We evaluate our system on the CypherBench graph dataset covering several general domains with diverse types of queries. In addition, we demonstrate performance of the proposed workflow on a property graph derived from the IFC (Industry Foundation Classes) data, representing a digital twin of a building. This highlights how such an approach can bridge AI with real-world applications at scale, enabling industrial digital automation use cases.
ParliaBench: An Evaluation and Benchmarking Framework for LLM-Generated Parliamentary Speech
Koniaris, Marios, Tsipi, Argyro, Tsanakas, Panayiotis
Parliamentary speech generation presents specific challenges for large language models beyond standard text generation tasks. Unlike general text generation, parliamentary speeches require not only linguistic quality but also political authenticity and ideological consistency. Current language models lack specialized training for parliamentary contexts, and existing evaluation methods focus on standard NLP metrics rather than political authenticity. To address this, we present ParliaBench, a benchmark for parliamentary speech generation. We constructed a dataset of speeches from UK Parliament to enable systematic model training. We introduce an evaluation framework combining computational metrics with LLM-as-a-judge assessments for measuring generation quality across three dimensions: linguistic quality, semantic coherence, and political authenticity. We propose two novel embedding-based metrics, Political Spectrum Alignment and Party Alignment, to quantify ideological positioning. We fine-tuned five large language models (LLMs), generated 28k speeches, and evaluated them using our framework, comparing baseline and fine-tuned models. Results show that fine-tuning produces statistically significant improvements across the majority of metrics and our novel metrics demonstrate strong discriminative power for political dimensions.
Multivariate Time series Anomaly Detection:A Framework of Hidden Markov Models
Li, Jinbo, Pedrycz, Witold, Jamal, Iqbal
In this study, we develop an approach to multivariate time series anomaly detection focused on the transformation of multivariate time series to univariate time series. Several transformation techniques involving Fuzzy C-Means (FCM) clustering and fuzzy integral are studied. In the sequel, a Hidden Markov Model (HMM), one of the commonly encountered statistical methods, is engaged here to detect anomalies in multivariate time series. We construct HMM-based anomaly detectors and in this context compare several transformation methods. A suite of experimental studies along with some comparative analysis is reported.
Data Descriptions from Large Language Models with Influence Estimation
Kim, Chaeri, Bae, Jaeyeon, Kim, Taehwan
Deep learning models have been successful in many areas but understanding their behaviors still remains a black-box. Most prior explainable AI (XAI) approaches have focused on interpreting and explaining how models make predictions. In contrast, we would like to understand how data can be explained with deep learning model training and propose a novel approach to understand the data via one of the most common media - language - so that humans can easily understand. Our approach proposes a pipeline to generate textual descriptions that can explain the data with large language models by incorporating external knowledge bases. However, generated data descriptions may still include irrelevant information, so we introduce to exploit influence estimation to choose the most informative textual descriptions, along with the CLIP score. Furthermore, based on the phenomenon of cross-modal transferability, we propose a novel benchmark task named cross-modal transfer classification to examine the effectiveness of our textual descriptions. In the experiment of zero-shot setting, we show that our textual descriptions are more effective than other baseline descriptions, and furthermore, we successfully boost the performance of the model trained only on images across all nine image classification datasets. These results are further supported by evaluation using GPT-4o. Through our approach, we may gain insights into the inherent interpretability of the decision-making process of the model.
A QP Framework for Improving Data Collection: Quantifying Device-Controller Performance in Robot Teleoperation
Zhao, Yuxuan, Tang, Yuanchen, Zhang, Jindi, Yu, Hongyu
Robot learning empowers the robot system with human brain-like intelligence to autonomously acquire and adapt skills through experience, enhancing flexibility and adaptability in various environments. Aimed at achieving a similar level of capability in large language models (LLMs) for embodied intelligence, data quality plays a crucial role in training a foundational model with diverse robot skills. In this study, we investigate the collection of data for manipulation tasks using teleoperation devices. Different devices yield varying effects when paired with corresponding controller strategies, including position-based inverse kinematics (IK) control, torque-based inverse dynamics (ID) control, and optimization-based compliance control. In this paper, we develop a teleoperation pipeline that is compatible with different teleoperation devices and manipulator controllers. Within the pipeline, we construct the optimal QP formulation with the dynamic nullspace and the impedance tracking as the novel optimal controller to achieve compliant pose tracking and singularity avoidance. Regarding the optimal controller, it adaptively adjusts the weights assignment depending on the robot joint manipulability that reflects the state of joint configuration for the pose tracking in the form of impedance control and singularity avoidance with nullspace tracking. Analysis of quantitative experimental results suggests the quality of the teleoperated trajectory data, including tracking error, occurrence of singularity, and the smoothness of the joints' trajectory, with different combinations of teleoperation interface and the motion controller.
CAVER: Curious Audiovisual Exploring Robot
Macesanu, Luca, Folefack, Boueny, Singh, Samik, Ray, Ruchira, Abbatematteo, Ben, Martín-Martín, Roberto
Abstract-- Multimodal audiovisual perception can enable new avenues for robotic manipulation, from better material classification to the imitation of demonstrations for which only audio signals are available (e.g., playing a tune by ear). However, to unlock such multimodal potential, robots need to learn the correlations between an object's visual appearance and the sound it generates when they interact with it. Such an active sensorimotor experience requires new interaction capabilities, representations, and exploration methods to guide the robot in efficiently building increasingly rich audiovisual knowledge. In this work, we present CA VER, a novel robot that builds and utilizes rich audiovisual representations of objects. CA VER includes three novel contributions: 1) a novel 3D printed end-effector, attachable to parallel grippers, that excites objects' audio responses, 2) an audiovisual representation that combines local and global appearance information with sound features, and 3) an exploration algorithm that uses and builds the audiovisual representation in a curiosity-driven manner that prioritizes interacting with high uncertainty objects to obtain good coverage of surprising audio with fewer interactions. We demonstrate that CA VER builds rich representations in different scenarios more efficiently than several exploration baselines, and that the learned audiovisual representation leads to significant improvements in material classification and the imitation of audio-only human demonstrations. Humans learn and exploit multimodal audiovisual cues in everyday life to obtain a more complete understanding of their environment and broader manipulation capabilities. We routinely fuse audio and vision to understand materials and reproduce behaviors: tapping a mug reveals glass vs. ceramic, and hearing a melody lets a musician find the right key. Building similar capabilities in robots would increase their robustness and autonomy, but requires a representation that couples how things look with how they sound when interacted with, and a way to acquire that representation efficiently through interaction.