Goto

Collaborating Authors

 Electrical Industrial Apparatus


A Navigation System for ROV's inspection on Fish Net Cage

arXiv.org Artificial Intelligence

In this paper, we modify an off-the-shelf ROV, the BlueROV2, into a ROS-based framework and develop a localization module, a path planning system, and a control framework. For real-time, local localization, we employ the open-source TagSLAM library. Additionally, we propose a control strategy based on a Nominal Feedback Controller (NFC) to achieve precise trajectory tracking. The proposed system has been implemented and validated through experiments in a controlled laboratory environment, demonstrating its effectiveness for real-world applications.


NANOGPT: A Query-Driven Large Language Model Retrieval-Augmented Generation System for Nanotechnology Research

arXiv.org Artificial Intelligence

This paper presents the development and application of a Large Language Model Retrieval-Augmented Generation (LLM-RAG) system tailored for nanotechnology research. The system leverages the capabilities of a sophisticated language model to serve as an intelligent research assistant, enhancing the efficiency and comprehensiveness of literature reviews in the nanotechnology domain. Central to this LLM-RAG system is its advanced query backend retrieval mechanism, which integrates data from multiple reputable sources. The system retrieves relevant literature by utilizing Google Scholar's advanced search, and scraping open-access papers from Elsevier, Springer Nature, and ACS Publications. This multifaceted approach ensures a broad and diverse collection of up-to-date scholarly articles and papers. The proposed system demonstrates significant potential in aiding researchers by providing a streamlined, accurate, and exhaustive literature retrieval process, thereby accelerating research advancements in nanotechnology. The effectiveness of the LLM-RAG system is validated through rigorous testing, illustrating its capability to significantly reduce the time and effort required for comprehensive literature reviews, while maintaining high accuracy, query relevance and outperforming standard, publicly available LLMS.


Bridging the PLC Binary Analysis Gap: A Cross-Compiler Dataset and Neural Framework for Industrial Control Systems

arXiv.org Artificial Intelligence

--Industrial Control Systems (ICS) rely heavily on Programmable Logic Controllers (PLCs) to manage critical infrastructure, yet analyzing PLC executables remains challenging due to diverse proprietary compilers and limited access to source code. T o bridge this gap, we introduce PLC-BEAD, a comprehensive dataset containing 2431 compiled binaries from 700+ PLC programs across four major industrial compilers (CoDeSys, GEB, OpenPLC-V2, OpenPLC-V3). We demonstrate the dataset's utility through PLCEmbed, a transformer-based framework for binary code analysis that achieves 93% accuracy in compiler provenance identification and 42% accuracy in fine-grained functionality classification across 22 industrial control categories. Through comprehensive ablation studies, we analyze how compiler optimization levels, code patterns, and class distributions influence model performance. We provide detailed documentation of the dataset creation process, labeling taxonomy, and benchmark protocols to ensure reproducibility. Both PLC-BEAD and PLCEmbed are released as open-source resources to foster research in PLC security, reverse engineering, and ICS forensics, establishing new baselines for data-driven approaches to industrial cybersecurity. Industrial Control Systems (ICS) rely heavily on Programmable Logic Controllers (PLCs) to manage critical infrastructure such as manufacturing, power generation, and transportation [1], [2]. Despite the advent of newer systems, many industrial sites continue to operate legacy PLCs that lack up-to-date documentation and source code [3]. This creates significant challenges for security analysis and maintenance, particularly in facilities that must remain operational around the clock [4], [5], [6]. High-profile incidents like Stuxnet and Triton demonstrate how attackers can target the PLC layer to disrupt physical processes with severe real-world consequences [7], [8]. In these cases, threat actors exploited vulnerabilities in the toolchain or the deployed PLC program. Such attacks underscore the urgent need for methods to inspect and analyze PLC executables even when source code is unavailable [7], [8], [5], [3].


Dynamic Classification: Leveraging Self-Supervised Classification to Enhance Prediction Performance

arXiv.org Artificial Intelligence

In this paper, we propose an innovative dynamic classification algorithm designed to achieve the objective of zero missed detections and minimal false positives. The algorithm partitions the data into N equivalent training subsets and N prediction subsets using a supervised model, followed by independent predictions from N separate predictive models. This enables each predictive model to operate within a smaller data range, thereby improving overall accuracy. Additionally, the algorithm leverages data generated through supervised learning to further refine prediction results, filtering out predictions that do not meet accuracy requirements without the need to introduce additional models. Experimental results demonstrate that, when data partitioning errors are minimal, the dynamic classification algorithm achieves exceptional performance with zero missed detections and minimal false positives, significantly outperforming existing model ensembles. Even in cases where classification errors are larger, the algorithm remains comparable to state of the art models. The key innovations of this study include self-supervised classification learning, the use of small-range subset predictions, and the direct rejection of substandard predictions. While the current algorithm still has room for improvement in terms of automatic parameter tuning and classification model efficiency, it has demonstrated outstanding performance across multiple datasets. Future research will focus on optimizing the classification component to further enhance the algorithm's robustness and adaptability.


BatteryLife: A Comprehensive Dataset and Benchmark for Battery Life Prediction

arXiv.org Artificial Intelligence

Battery Life Prediction (BLP), which relies on time series data produced by battery degradation tests, is crucial for battery utilization, optimization, and production. Despite impressive advancements, this research area faces three key challenges. Firstly, the limited size of existing datasets impedes insights into modern battery life data. Secondly, most datasets are restricted to small-capacity lithium-ion batteries tested under a narrow range of diversity in labs, raising concerns about the generalizability of findings. Thirdly, inconsistent and limited benchmarks across studies obscure the effectiveness of baselines and leave it unclear if models popular in other time series fields are effective for BLP. To address these challenges, we propose BatteryLife, a comprehensive dataset and benchmark for BLP. BatteryLife integrates 16 datasets, offering a 2.4 times sample size compared to the previous largest dataset, and provides the most diverse battery life resource with batteries from 8 formats, 80 chemical systems, 12 operating temperatures, and 646 charge/discharge protocols, including both laboratory and industrial tests. Notably, BatteryLife is the first to release battery life datasets of zinc-ion batteries, sodium-ion batteries, and industry-tested large-capacity lithium-ion batteries. With the comprehensive dataset, we revisit the effectiveness of baselines popular in this and other time series fields. Furthermore, we propose CyclePatch, a plug-in technique that can be employed in a series of neural networks. Extensive benchmarking of 18 methods reveals that models popular in other time series fields can be unsuitable for BLP, and CyclePatch consistently improves model performance establishing state-of-the-art benchmarks. Moreover, BatteryLife evaluates model performance across aging conditions and domains. BatteryLife is available at https://github.com/Ruifeng-Tan/BatteryLife.


Functional Bayesian Additive Regression Trees with Shape Constraints

arXiv.org Machine Learning

Motivated by the great success of Bayesian additive regression trees (BART) on regression, we propose a nonparametric Bayesian approach for the function-on-scalar regression problem, termed as Functional BART (FBART). Utilizing spline-based function representation and tree-based domain partition model, FBART offers great flexibility in characterizing the complex and heterogeneous relationship between the response curve and scalar covariates. We devise a tailored Bayesian backfitting algorithm for estimating the parameters in the FBART model. Furthermore, we introduce an FBART model with shape constraints on the response curve, enhancing estimation and prediction performance when prior shape information of response curves is available. By incorporating a shape-constrained prior, we ensure that the posterior samples of the response curve satisfy the required shape constraints (e.g., monotonicity and/or convexity). Our proposed FBART model and its shape-constrained version are the new advances of BART models for functional data. Under certain regularity conditions, we derive the posterior convergence results for both FBART and its shape-constrained version. Finally, the superiority of the proposed methods over other competitive counterparts is validated through simulation experiments under various settings and analyses of two real datasets.


OBELiX: A Curated Dataset of Crystal Structures and Experimentally Measured Ionic Conductivities for Lithium Solid-State Electrolytes

arXiv.org Artificial Intelligence

Solid-state electrolyte batteries are expected to replace liquid electrolyte lithium-ion batteries in the near future thanks to their higher theoretical energy density and improved safety. However, their adoption is currently hindered by their lower effective ionic conductivity, a quantity that governs charge and discharge rates. Identifying highly ion-conductive materials using conventional theoretical calculations and experimental validation is both time-consuming and resource-intensive. While machine learning holds the promise to expedite this process, relevant ionic conductivity and structural data is scarce. Here, we present OBELiX, a domain-expert-curated database of $\sim$600 synthesized solid electrolyte materials and their experimentally measured room temperature ionic conductivities gathered from literature. Each material is described by their measured composition, space group and lattice parameters. A full-crystal description in the form of a crystallographic information file (CIF) is provided for ~320 structures for which atomic positions were available. We discuss various statistics and features of the dataset and provide training and testing splits that avoid data leakage. Finally, we benchmark seven existing ML models on the task of predicting ionic conductivity and discuss their performance. The goal of this work is to facilitate the use of machine learning for solid-state electrolyte materials discovery.


Learning the P2D Model for Lithium-Ion Batteries with SOH Detection

arXiv.org Artificial Intelligence

Lithium ion batteries are widely used in many applications. Battery management systems control their optimal use and charging and predict when the battery will cease to deliver the required output on a planned duty or driving cycle. Such systems use a simulation of a mathematical model of battery performance. These models can be electrochemical or data-driven. Electrochemical models for batteries running at high currents are mathematically and computationally complex. In this work, we show that a well-regarded electrochemical model, the Pseudo Two Dimensional (P2D) model, can be replaced by a computationally efficient Convolutional Neural Network (CNN) surrogate model fit to accurately simulated data from a class of random driving cycles. We demonstrate that a CNN is an ideal choice for accurately capturing Lithium ion concentration profiles. Additionally, we show how the neural network model can be adjusted to correspond to battery changes in State of Health (SOH).


Towards Effective Extraction and Evaluation of Factual Claims

arXiv.org Artificial Intelligence

A common strategy for fact-checking long-form content generated by Large Language Models (LLMs) is extracting simple claims that can be verified independently. Since inaccurate or incomplete claims compromise fact-checking results, ensuring claim quality is critical. However, the lack of a standardized evaluation framework impedes assessment and comparison of claim extraction methods. To address this gap, we propose a framework for evaluating claim extraction in the context of fact-checking along with automated, scalable, and replicable methods for applying this framework, including novel approaches for measuring coverage and decontextualization. We also introduce Claimify, an LLM-based claim extraction method, and demonstrate that it outperforms existing methods under our evaluation framework. A key feature of Claimify is its ability to handle ambiguity and extract claims only when there is high confidence in the correct interpretation of the source text.


Fine-Tuning Hard-to-Simulate Objectives for Quadruped Locomotion: A Case Study on Total Power Saving

arXiv.org Artificial Intelligence

Legged locomotion is not just about mobility; it also encompasses crucial objectives such as energy efficiency, safety, and user experience, which are vital for real-world applications. However, key factors such as battery power consumption and stepping noise are often inaccurately modeled or missing in common simulators, leaving these aspects poorly optimized or unaddressed by current sim-to-real methods. Hand-designed proxies, such as mechanical power and foot contact forces, have been used to address these challenges but are often problem-specific and inaccurate. In this paper, we propose a data-driven framework for fine-tuning locomotion policies, targeting these hard-to-simulate objectives. Our framework leverages real-world data to model these objectives and incorporates the learned model into simulation for policy improvement. We demonstrate the effectiveness of our framework on power saving for quadruped locomotion, achieving a significant 24-28\% net reduction in total power consumption from the battery pack at various speeds. In essence, our approach offers a versatile solution for optimizing hard-to-simulate objectives in quadruped locomotion, providing an easy-to-adapt paradigm for continual improving with real-world knowledge. Project page https://hard-to-sim.github.io/.