Energy
Follow the Energy, Find the Path: Riemannian Metrics from Energy-Based Models
Béthune, Louis, Vigouroux, David, Du, Yilun, VanRullen, Rufin, Serre, Thomas, Boutin, Victor
What is the shortest path between two data points lying in a high-dimensional space? While the answer is trivial in Euclidean geometry, it becomes significantly more complex when the data lies on a curved manifold -- requiring a Riemannian metric to describe the space's local curvature. Estimating such a metric, however, remains a major challenge in high dimensions. In this work, we propose a method for deriving Riemannian metrics directly from pretrained Energy-Based Models (EBMs) -- a class of generative models that assign low energy to high-density regions. These metrics define spatially varying distances, enabling the computation of geodesics -- shortest paths that follow the data manifold's intrinsic geometry. We introduce two novel metrics derived from EBMs and show that they produce geodesics that remain closer to the data manifold and exhibit lower curvature distortion, as measured by alignment with ground-truth trajectories. We evaluate our approach on increasingly complex datasets: synthetic datasets with known data density, rotated character images with interpretable geometry, and high-resolution natural images embedded in a pretrained VAE latent space. Our results show that EBM-derived metrics consistently outperform established baselines, especially in high-dimensional settings. Our work is the first to derive Riemannian metrics from EBMs, enabling data-aware geodesics and unlocking scalable, geometry-driven learning for generative modeling and simulation.
Transfer learning discovery of molecular modulators for perovskite solar cells
Yan, Haoming, Chen, Xinyu, Wang, Yanran, Luo, Zhengchao, Huang, Weizheng, Wang, Hongshuai, Chen, Peng, Zhang, Yuzhi, Sun, Weijie, Wang, Jinzhuo, Gong, Qihuang, Zhu, Rui, Zhao, Lichen
The discovery of effective molecular modulators is essential for advancing perovskite solar cells (PSCs), but the research process is hindered by the vastness of chemical space and the time-consuming and expensive trial-and-error experimental screening. Concurrently, machine learning (ML) offers significant potential for accelerating materials discovery. However, applying ML to PSCs remains a major challenge due to data scarcity and limitations of traditional quantitative structure-property relationship (QSPR) models. Here, we apply a chemical informed transfer learning framework based on pre-trained deep neural networks, which achieves high accuracy in predicting the molecular modulator's effect on the power conversion efficiency (PCE) of PSCs. This framework is established through systematical benchmarking of diverse molecular representations, enabling lowcost and high-throughput virtual screening over 79,043 commercially available molecules. Furthermore, we leverage interpretability techniques to visualize the learned chemical representation and experimentally characterize the resulting modulator-perovskite interactions. The top molecular modulators identified by the framework are subsequently validated experimentally, delivering a remarkably improved champion PCE of 26.91% in PSCs.
Generative Modeling Enables Molecular Structure Retrieval from Coulomb Explosion Imaging
Li, Xiang, Jahnke, Till, Boll, Rebecca, Han, Jiaqi, Xu, Minkai, Meyer, Michael, Piancastelli, Maria Novella, Rolles, Daniel, Rudenko, Artem, Trinter, Florian, Wolf, Thomas J. A., Thayer, Jana B., Cryan, James P., Ermon, Stefano, Ho, Phay J.
Capturing the structural changes that molecules undergo during chemical reactions in real space and time is a long-standing dream and an essential prerequisite for understanding and ultimately controlling femtochemistry. A key approach to tackle this challenging task is Coulomb explosion imaging, which benefited decisively from recently emerging high-repetition-rate X-ray free-electron laser sources. With this technique, information on the molecular structure is inferred from the momentum distributions of the ions produced by the rapid Coulomb explosion of molecules. Retrieving molecular structures from these distributions poses a highly non-linear inverse problem that remains unsolved for molecules consisting of more than a few atoms. Here, we address this challenge using a diffusion-based Transformer neural network. We show that the network reconstructs unknown molecular geometries from ion-momentum distributions with a mean absolute error below one Bohr radius, which is half the length of a typical chemical bond.
Deep recurrent-convolutional neural network learning and physics Kalman filtering comparison in dynamic load identification
The dynamic structural load identification capabilities of the gated recurrent unit, long short-term memory, and convolutional neural networks are examined herein. The examination is on realistic small dataset training conditions and on a comparative view to the physics-based residual Kalman filter (RKF). The dynamic load identification suffers from the uncertainty related to obtaining poor predictions when in civil engineering applications only a low number of tests are performed or are available, or when the structural model is unidentifiable. In considering the methods, first, a simulated structure is investigated under a shaker excitation at the top floor. Second, a building in California is investigated under seismic base excitation, which results in loading for all degrees of freedom. Finally, the International Association for Structural Control-American Society of Civil Engineers (IASC-ASCE) structural health monitoring benchmark problem is examined for impact and instant loading conditions. Importantly, the methods are shown to outperform each other on different loading scenarios, while the RKF is shown to outperform the networks in physically parametrized identifiable cases.
A generative adversarial network optimization method for damage detection and digital twinning by deep AI fault learning: Z24 Bridge structural health monitoring benchmark validation
Impraimakis, Marios, Palkanoglou, Evangelia Nektaria
The optimization-based damage detection and damage state digital twinning capabilities are examined here of a novel conditional-labeled generative adversarial network methodology. The framework outperforms current approaches for fault anomaly detection as no prior information is required for the health state of the system: a topic of high significance for real-world applications. Specifically, current artificial intelligence-based digital twinning approaches suffer from the uncertainty related to obtaining poor predictions when a low number of measurements is available, physics knowledge is missing, or when the damage state is unknown. To this end, an unsupervised framework is examined and validated rigorously on the benchmark structural health monitoring measurements of Z24 Bridge: a post-tensioned concrete highway bridge in Switzerland. In implementing the approach, firstly, different same damage-level measurements are used as inputs, while the model is forced to converge conditionally to two different damage states. Secondly, the process is repeated for a different group of measurements. Finally, the convergence scores are compared to identify which one belongs to a different damage state. The process for both healthy-to-healthy and damage-to-healthy input data creates, simultaneously, measurements for digital twinning purposes at different damage states, capable of pattern recognition and machine learning data generation. Further to this process, a support vector machine classifier and a principal component analysis procedure is developed to assess the generated and real measurements of each damage category, serving as a secondary new dynamics learning indicator in damage scenarios. Importantly, the approach is shown to capture accurately damage over healthy measurements, providing a powerful tool for vibration-based system-level monitoring and scalable infrastructure resilience.
World Simulation with Video Foundation Models for Physical AI
NVIDIA, null, :, null, Ali, Arslan, Bai, Junjie, Bala, Maciej, Balaji, Yogesh, Blakeman, Aaron, Cai, Tiffany, Cao, Jiaxin, Cao, Tianshi, Cha, Elizabeth, Chao, Yu-Wei, Chattopadhyay, Prithvijit, Chen, Mike, Chen, Yongxin, Chen, Yu, Cheng, Shuai, Cui, Yin, Diamond, Jenna, Ding, Yifan, Fan, Jiaojiao, Fan, Linxi, Feng, Liang, Ferroni, Francesco, Fidler, Sanja, Fu, Xiao, Gao, Ruiyuan, Ge, Yunhao, Gu, Jinwei, Gupta, Aryaman, Gururani, Siddharth, Hanafi, Imad El, Hassani, Ali, Hao, Zekun, Huffman, Jacob, Jang, Joel, Jannaty, Pooya, Kautz, Jan, Lam, Grace, Li, Xuan, Li, Zhaoshuo, Liao, Maosheng, Lin, Chen-Hsuan, Lin, Tsung-Yi, Lin, Yen-Chen, Ling, Huan, Liu, Ming-Yu, Liu, Xian, Lu, Yifan, Luo, Alice, Ma, Qianli, Mao, Hanzi, Mo, Kaichun, Nah, Seungjun, Narang, Yashraj, Panaskar, Abhijeet, Pavao, Lindsey, Pham, Trung, Ramezanali, Morteza, Reda, Fitsum, Reed, Scott, Ren, Xuanchi, Shao, Haonan, Shen, Yue, Shi, Stella, Song, Shuran, Stefaniak, Bartosz, Sun, Shangkun, Tang, Shitao, Tasmeen, Sameena, Tchapmi, Lyne, Tseng, Wei-Cheng, Varghese, Jibin, Wang, Andrew Z., Wang, Hao, Wang, Haoxiang, Wang, Heng, Wang, Ting-Chun, Wei, Fangyin, Xu, Jiashu, Yang, Dinghao, Yang, Xiaodong, Ye, Haotian, Ye, Seonghyeon, Zeng, Xiaohui, Zhang, Jing, Zhang, Qinsheng, Zheng, Kaiwen, Zhu, Andrew, Zhu, Yuke
We introduce [Cosmos-Predict2.5], the latest generation of the Cosmos World Foundation Models for Physical AI. Built on a flow-based architecture, [Cosmos-Predict2.5] unifies Text2World, Image2World, and Video2World generation in a single model and leverages [Cosmos-Reason1], a Physical AI vision-language model, to provide richer text grounding and finer control of world simulation. Trained on 200M curated video clips and refined with reinforcement learning-based post-training, [Cosmos-Predict2.5] achieves substantial improvements over [Cosmos-Predict1] in video quality and instruction alignment, with models released at 2B and 14B scales. These capabilities enable more reliable synthetic data generation, policy evaluation, and closed-loop simulation for robotics and autonomous systems. We further extend the family with [Cosmos-Transfer2.5], a control-net style framework for Sim2Real and Real2Real world translation. Despite being 3.5$\times$ smaller than [Cosmos-Transfer1], it delivers higher fidelity and robust long-horizon video generation. Together, these advances establish [Cosmos-Predict2.5] and [Cosmos-Transfer2.5] as versatile tools for scaling embodied intelligence. To accelerate research and deployment in Physical AI, we release source code, pretrained checkpoints, and curated benchmarks under the NVIDIA Open Model License at https://github.com/nvidia-cosmos/cosmos-predict2.5 and https://github.com/nvidia-cosmos/cosmos-transfer2.5. We hope these open resources lower the barrier to adoption and foster innovation in building the next generation of embodied intelligence.
AeroResQ: Edge-Accelerated UAV Framework for Scalable, Resilient and Collaborative Escape Route Planning in Wildfire Scenarios
Raj, Suman, Mittal, Radhika, Mayani, Rajiv, Zuk, Pawel, Mandal, Anirban, Zink, Michael, Simmhan, Yogesh, Deelman, Ewa
Drone fleets equipped with onboard cameras, computer vision, and Deep Neural Network (DNN) models present a powerful paradigm for real-time spatio-temporal decision-making. In wildfire response, such drones play a pivotal role in monitoring fire dynamics, supporting firefighter coordination, and facilitating safe evacuation. In this paper, we introduce AeroResQ, an edge-accelerated UAV framework designed for scalable, resilient, and collaborative escape route planning during wildfire scenarios. AeroResQ adopts a multi-layer orchestration architecture comprising service drones (SDs) and coordinator drones (CDs), each performing specialized roles. SDs survey fire-affected areas, detect stranded individuals using onboard edge accelerators running fire detection and human pose identification DNN models, and issue requests for assistance. CDs, equipped with lightweight data stores such as Apache IoTDB, dynamically generate optimal ground escape routes and monitor firefighter movements along these routes. The framework proposes a collaborative path-planning approach based on a weighted A* search algorithm, where CDs compute context-aware escape paths. AeroResQ further incorporates intelligent load-balancing and resilience mechanisms: CD failures trigger automated data redistribution across IoTDB replicas, while SD failures initiate geo-fenced re-partitioning and reassignment of spatial workloads to operational SDs. We evaluate AeroResQ using realistic wildfire emulated setup modeled on recent Southern California wildfires. Experimental results demonstrate that AeroResQ achieves a nominal end-to-end latency of <=500ms, much below the 2s request interval, while maintaining over 98% successful task reassignment and completion, underscoring its feasibility for real-time, on-field deployment in emergency response and firefighter safety operations.
Chitchat with AI: Understand the supply chain carbon disclosure of companies worldwide through Large Language Model
Hang, Haotian, Shen, Yueyang, Zhu, Vicky, Cruz, Jose, Li, Michelle
In the context of global sustainability mandates, corporate carbon disclosure has emerged as a critical mechanism for aligning business strategy with environmental responsibility. The Carbon Disclosure Project (CDP) hosts the world's largest longitudinal dataset of climate-related survey responses, combining structured indicators with open-ended narratives, but the heterogeneity and free-form nature of these disclosures present significant analytical challenges for benchmarking, compliance monitoring, and investment screening. This paper proposes a novel decision-support framework that leverages large language models (LLMs) to assess corporate climate disclosure quality at scale. It develops a master rubric that harmonizes narrative scoring across 11 years of CDP data (2010-2020), enabling cross-sector and cross-country benchmarking. By integrating rubric-guided scoring with percentile-based normalization, our method identifies temporal trends, strategic alignment patterns, and inconsistencies in disclosure across industries and regions. Results reveal that sectors such as technology and countries like Germany consistently demonstrate higher rubric alignment, while others exhibit volatility or superficial engagement, offering insights that inform key decision-making processes for investors, regulators, and corporate environmental, social, and governance (ESG) strategists. The proposed LLM-based approach transforms unstructured disclosures into quantifiable, interpretable, comparable, and actionable intelligence, advancing the capabilities of AI-enabled decision support systems (DSSs) in the domain of climate governance.
Deep Learning Models for Coral Bleaching Classification in Multi-Condition Underwater Image Datasets
Macrohon, Julio Jerison E., Hung, Gordon
Coral reefs support numerous marine organisms and are an important source of coastal protection from storms and floods, representing a major part of marine ecosystems. However coral reefs face increasing threats from pollution, ocean acidification, and sea temperature anomalies, making efficient protection and monitoring heavily urgent. Therefore, this study presents a novel machine-learning-based coral bleaching classification system based on a diverse global dataset with samples of healthy and bleached corals under varying environmental conditions, including deep seas, marshes, and coastal zones. We benchmarked and compared three state-of-the-art models: Residual Neural Network (ResNet), Vision Transformer (ViT), and Convolutional Neural Network (CNN). After comprehensive hyperparameter tuning, the CNN model achieved the highest accuracy of 88%, outperforming existing benchmarks. Our findings offer important insights into autonomous coral monitoring and present a comprehensive analysis of the most widely used computer vision models.
An Open-Access Benchmark of Statistical and Machine-Learning Anomaly Detection Methods for Battery Applications
Pang, Mei-Chin, Adhikari, Suraj, Kasahara, Takuma, Haba, Nagihiro, Ohno, Saneyuki
Battery safety is critical in applications ranging from consumer electronics to electric vehicles and aircraft, where undetected anomalies could trigger safety hazards or costly downtime. In this study, we present OSBAD as an open-source benchmark for anomaly detection frameworks in battery applications. By benchmarking 15 diverse algorithms encompassing statistical, distance-based, and unsupervised machine-learning methods, OSBAD enables a systematic comparison of anomaly detection methods across heterogeneous datasets. In addition, we demonstrate how a physics- and statistics-informed feature transformation workflow enhances anomaly separability by decomposing collective anomalies into point anomalies. To address a major bottleneck in unsupervised anomaly detection due to incomplete labels, we propose a Bayesian optimization pipeline that facilitates automated hyperparameter tuning based on transfer-learning and regression proxies. Through validation on datasets covering both liquid and solid-state chemistries, we further demonstrate the cross-chemistry generalization capability of OSBAD to identify irregularities across different electrochemical systems. By making benchmarking database with open-source reproducible anomaly detection workflows available to the community, OSBAD establishes a unified foundation for developing safe, scalable, and transferable anomaly detection tools in battery analytics. This research underscores the significance of physics- and statistics-informed feature engineering as well as model selection with probabilistic hyperparameter tuning, in advancing trustworthy, data-driven diagnostics for safety-critical energy systems.