Goto

Collaborating Authors

 Materials


Multi-Modal Attention Networks for Enhanced Segmentation and Depth Estimation of Subsurface Defects in Pulse Thermography

arXiv.org Artificial Intelligence

AI-driven pulse thermography (PT) has become a crucial tool in non-destructive testing (NDT), enabling automatic detection of hidden anomalies in various industrial components. Current state-of-the-art techniques feed segmentation and depth estimation networks compressed PT sequences using either Principal Component Analysis (PCA) or Thermographic Signal Reconstruction (TSR). However, treating these two modalities independently constrains the performance of PT inspection models as these representations possess complementary semantic features. To address this limitation, this work proposes PT-Fusion, a multi-modal attention-based fusion network that fuses both PCA and TSR modalities for defect segmentation and depth estimation of subsurface defects in PT setups. PT-Fusion introduces novel feature fusion modules, Encoder Attention Fusion Gate (EAFG) and Attention Enhanced Decoding Block (AEDB), to fuse PCA and TSR features for enhanced segmentation and depth estimation of subsurface defects. In addition, a novel data augmentation technique is proposed based on random data sampling from thermographic sequences to alleviate the scarcity of PT datasets. The proposed method is benchmarked against state-of-the-art PT inspection models, including U-Net, attention U-Net, and 3D-CNN on the Universit\'e Laval IRT-PVC dataset. The results demonstrate that PT-Fusion outperforms the aforementioned models in defect segmentation and depth estimation accuracies with a margin of 10%.


Deployment of an Aerial Multi-agent System for Automated Task Execution in Large-scale Underground Mining Environments

arXiv.org Artificial Intelligence

In this article, we present a framework for deploying an aerial multi-agent system in large-scale subterranean environments with minimal infrastructure for supporting multi-agent operations. The multi-agent objective is to optimally and reactively allocate and execute inspection tasks in a mine, which are entered by a mine operator onthe-fly. The assignment of currently available tasks to the team of agents is accomplished through an auction-based system, where the agents bid for available tasks, which are used by a central auctioneer to optimally assigns tasks to agents. A mobile Wi-Fi mesh supports inter-agent communication and bi-directional communication between the agents and the task allocator, while the task execution is performed completely infrastructure-free. Given a task to be accomplished, a reliable and modular agent behavior is synthesized by generating behavior trees from a pool of agent capabilities, using a back-chaining approach. The auction system in the proposed framework is reactive and supports addition of new operator-specified tasks on-the-go, at any point through a user-friendly operator interface. The framework has been validated in a real underground mining environment using three aerial agents, with several inspection locations spread in an environment of almost 200 meters. The proposed framework can be utilized for missions involving rapid inspection, gas detection, distributed sensing and mapping etc. in a subterranean environment. The proposed framework and its field deployment contributes towards furthering reliable automation in large-scale subterranean environments to offload both routine and dangerous tasks from human operators to autonomous aerial robots. The use of autonomous robotic platforms in industrial production facilities is on the rise, both to increase profitability and to increase safety for human operators [1]. Specifically, in deep underground mining, where the fundamental risk of accidents is high, the industry is focusing on creating a safer environment for humans by deploying robotic systems to either execute dangerous tasks or verify the safety before authorizing human entry. Through efforts in the mining industry, human workers have already been moved to safer locations in several critical operations via, for instance, teleoperation of heavy machinery.


ForestProtector: An IoT Architecture Integrating Machine Vision and Deep Reinforcement Learning for Efficient Wildfire Monitoring

arXiv.org Artificial Intelligence

Early detection of forest fires is crucial to minimizing the environmental and socioeconomic damage they cause. Indeed, a fire's duration directly correlates with the difficulty and cost of extinguishing it. For instance, a fire burning for 1 minute might require 1 liter of water to extinguish, while a 2-minute fire could demand 100 liters, and a 10-minute fire might necessitate 1,000 liters. On the other hand, existing fire detection systems based on novel technologies (e.g., remote sensing, PTZ cameras, UAVs) are often expensive and require human intervention, making continuous monitoring of large areas impractical. To address this challenge, this work proposes a low-cost forest fire detection system that utilizes a central gateway device with computer vision capabilities to monitor a 360{\deg} field of view for smoke at long distances. A deep reinforcement learning agent enhances surveillance by dynamically controlling the camera's orientation, leveraging real-time sensor data (smoke levels, ambient temperature, and humidity) from distributed IoT devices. This approach enables automated wildfire monitoring across expansive areas while reducing false positives.


LLM360 K2: Building a 65B 360-Open-Source Large Language Model from Scratch

arXiv.org Artificial Intelligence

We detail the training of the LLM360 K2-65B model, scaling up our 360-degree OPEN SOURCE approach to the largest and most powerful models under project LLM360. While open-source LLMs continue to advance, the answer to "How are the largest LLMs trained?" remains unclear within the community. The implementation details for such high-capacity models are often protected due to business considerations associated with their high cost. This lack of transparency prevents LLM researchers from leveraging valuable insights from prior experience, e.g., "What are the best practices for addressing loss spikes?" The LLM360 K2 project addresses this gap by providing full transparency and access to resources accumulated during the training of LLMs at the largest scale. This report highlights key elements of the K2 project, including our first model, K2 DIAMOND, a 65 billion-parameter LLM that surpasses LLaMA-65B and rivals LLaMA2-70B, while requiring fewer FLOPs and tokens. We detail the implementation steps and present a longitudinal analysis of K2 DIAMOND's capabilities throughout its training process. We also outline ongoing projects such as TXT360, setting the stage for future models in the series. By offering previously unavailable resources, the K2 project also resonates with the 360-degree OPEN SOURCE principles of transparency, reproducibility, and accessibility, which we believe are vital in the era of resource-intensive AI research.


GRAPPA -- A Hybrid Graph Neural Network for Predicting Pure Component Vapor Pressures

arXiv.org Artificial Intelligence

Although the pure component vapor pressure is one of the most important properties for designing chemical processes, no broadly applicable, sufficiently accurate, and open-source prediction method has been available. To overcome this, we have developed GRAPPA - a hybrid graph neural network for predicting vapor pressures of pure components. GRAPPA enables the prediction of the vapor pressure curve of basically any organic molecule, requiring only the molecular structure as input. The new model consists of three parts: A graph attention network for the message passing step, a pooling function that captures long-range interactions, and a prediction head that yields the component-specific parameters of the Antoine equation, from which the vapor pressure can readily and consistently be calculated for any temperature. We have trained and evaluated GRAPPA on experimental vapor pressure data of almost 25,000 pure components. We found excellent prediction accuracy for unseen components, outperforming state-of-the-art group contribution methods and other machine learning approaches in applicability and accuracy. The trained model and its code are fully disclosed, and GRAPPA is directly applicable via the interactive website ml-prop.mv.rptu.de.


Modeling Melt Pool Features and Spatter Using Symbolic Regression and Machine Learning

arXiv.org Artificial Intelligence

Additive manufacturing (AM) is a rapidly evolving technology that has attracted applications across a wide range of fields due to its ability to fabricate complex geometries. However, one of the key challenges in AM is achieving consistent print quality. This inconsistency is often attributed to uncontrolled melt pool dynamics, partly caused by spatter which can lead to defects. Therefore, capturing and controlling the evolution of the melt pool is crucial for enhancing process stability and part quality. In this study, we developed a framework to support decision-making in AM operations, facilitating quality control and minimizing defects via machine learning (ML) and polynomial symbolic regression models. We implemented experimentally validated computational tools as a cost-effective approach to collect large datasets from laser powder bed fusion (LPBF) processes. For a dataset consisting of 281 process conditions, parameters such as melt pool dimensions (length, width, depth), melt pool geometry (area, volume), and volume indicated as spatter were extracted. Using machine learning (ML) and polynomial symbolic regression models, a high R2 of over 95 % was achieved in predicting the melt pool dimensions and geometry features for both the training and testing datasets, with either process conditions (power and velocity) or melt pool dimensions as the model inputs. In the case of volume indicated as spatter, R2 improved after logarithmic transforming the model inputs, which was either the process conditions or the melt pool dimensions. Among the investigated ML models, the ExtraTree model achieved the highest R2 values of 96.7 % and 87.5 %.


Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures

arXiv.org Artificial Intelligence

Visual-Spatial Systems has become increasingly essential in concrete crack inspection. However, existing methods often lacks adaptability to diverse scenarios, exhibits limited robustness in image-based approaches, and struggles with curved or complex geometries. To address these limitations, an innovative framework for two-dimensional (2D) crack detection, three-dimensional (3D) reconstruction, and 3D automatic crack measurement was proposed by integrating computer vision technologies and multi-modal Simultaneous localization and mapping (SLAM) in this study. Firstly, building on a base DeepLabv3+ segmentation model, and incorporating specific refinements utilizing foundation model Segment Anything Model (SAM), we developed a crack segmentation method with strong generalization across unfamiliar scenarios, enabling the generation of precise 2D crack masks. To enhance the accuracy and robustness of 3D reconstruction, Light Detection and Ranging (LiDAR) point clouds were utilized together with image data and segmentation masks. By leveraging both image- and LiDAR-SLAM, we developed a multi-frame and multi-modal fusion framework that produces dense, colorized point clouds, effectively capturing crack semantics at a 3D real-world scale. Furthermore, the crack geometric attributions were measured automatically and directly within 3D dense point cloud space, surpassing the limitations of conventional 2D image-based measurements. This advancement makes the method suitable for structural components with curved and complex 3D geometries. Experimental results across various concrete structures highlight the significant improvements and unique advantages of the proposed method, demonstrating its effectiveness, accuracy, and robustness in real-world applications.


Leveraging Large Language Models as Knowledge-Driven Agents for Reliable Retrosynthesis Planning

arXiv.org Artificial Intelligence

Identifying reliable synthesis pathways in materials chemistry is a complex task, particularly in polymer science, due to the intricate and often non-unique nomenclature of macromolecules. To address this challenge, we propose an agent system that integrates large language models (LLMs) and knowledge graphs (KGs). By leveraging LLMs' powerful capabilities for extracting and recognizing chemical substance names, and storing the extracted data in a structured knowledge graph, our system fully automates the retrieval of relevant literatures, extraction of reaction data, database querying, construction of retrosynthetic pathway trees, further expansion through the retrieval of additional literature and recommendation of optimal reaction pathways. A novel Multi-branched Reaction Pathway Search (MBRPS) algorithm enables the exploration of all pathways, with a particular focus on multi-branched ones, helping LLMs overcome weak reasoning in multi-branched paths. This work represents the first attempt to develop a fully automated retrosynthesis planning agent tailored specially for macromolecules powered by LLMs. Applied to polyimide synthesis, our new approach constructs a retrosynthetic pathway tree with hundreds of pathways and recommends optimized routes, including both known and novel pathways, demonstrating its effectiveness and potential for broader applications.


A Universal Catalyst for First-Order Optimization

Neural Information Processing Systems

We introduce a generic scheme for accelerating first-order optimization methods in the sense of Nesterov, which builds upon a new analysis of the accelerated proximal point algorithm. Our approach consists of minimizing a convex objective by approximately solving a sequence of well-chosen auxiliary problems, leading to faster convergence. This strategy applies to a large class of algorithms, including gradient descent, block coordinate descent, SAG, SAGA, SDCA, SVRG, Finito/MISO, and their proximal variants. For all of these methods, we provide acceleration and explicit support for non-strongly convex objectives. In addition to theoretical speed-up, we also show that acceleration is useful in practice, especially for ill-conditioned problems where we measure significant improvements.


Scientists explain why BepiColombo's mission to Mercury is so tricky

Popular Science

It seems like it should be pretty easy to get to Mercury. The little rocky planet is so much closer to Earth than distant destinations like Jupiter, where we've successfully sent multiple spacecraft. Plus, it doesn't have a crushing atmosphere like our nearest neighbor Venus. But, in fact, it's actually really difficult to reach the innermost planet of our solar system--which makes it that much more impressive that the ESA and JAXA's BepiColombo mission has almost reached Mercury, recently completing its final flyby of the planet before entering orbit next year. Reaching Mercury is such a challenge because "the gravitational pull of the Sun is very strong near Mercury, which makes it difficult for spacecraft to slow down enough to enter orbit around the planet," explains Lina Hadid, staff scientist at CNRS in France and principal investigator of one of BepiColombo's instruments.