Goto

Collaborating Authors

 Materials


Data-driven development of cycle prediction models for lithium metal batteries using multi modal mining

arXiv.org Artificial Intelligence

These authors contributed equally: J. Lee, J. Woo *: Corresponding author Corresponding author Email: Jihankim@kaist.ac.kr (Jihan Kim), stevepark@kaist.ac.kr (Steve Park), heetak.kim@kaist.ac.kr (Hee-Tak Kim) Abstract Recent advances in data-driven research have shown great potential in understanding the intricate relationships between materials and their performances. Herein, we introduce a novel multi modal data-driven approach employing an Automatic Battery data Collector (ABC) that integrates a large language model (LLM) with an automatic graph mining tool, Material Graph Digitizer (MatGD). This platform enables state-of-the-art accurate extraction of battery material data and cyclability performance metrics from diverse textual and graphical data sources. From the database derived through the ABC platform, we developed machine learning models that can accurately predict the capacity and stability of lithium metal batteries, which is the first-ever model developed to achieve such predictions. Our models were also experimentally validated, confirming practical applicability and reliability of our data-driven approach. INTRODUCTION Lithium metal batteries (LMBs) are a promising next-generation device that can achieve high capacity using lithium metal as an anode due to its exceptionally low density (0.534 g cm Therefore, these studies lack sufficient information to discern a comprehensive effect of different components on the battery performance. Additionally, previous mining research focused not on the entire battery cells but rather on the characteristics of individual battery components. Moreover, these studies were limited by the small number of entities considered and did not extract quantitative information such as concentrations or ratios. Furthermore, the absence of automatic graph mining tools made it difficult to obtain performance data from graphs, such as specific capacity and cycle stability.


Material synthesis through simulations guided by machine learning: a position paper

arXiv.org Artificial Intelligence

In this position paper, we propose an approach for sustainable data collection in the field of optimal mix design for marble sludge reuse. Marble sludge, a calcium-rich residual from stone-cutting processes, can be repurposed by mixing it with various ingredients. However, determining the optimal mix design is challenging due to the variability in sludge composition and the costly, time-consuming nature of experimental data collection. Also, we investigate the possibility of using machine learning models using meta-learning as an optimization tool to estimate the correct quantity of stone-cutting sludge to be used in aggregates to obtain a mix design with specific mechanical properties that can be used successfully in the building industry. Our approach offers two key advantages: (i) through simulations, a large dataset can be generated, saving time and money during the data collection phase, and (ii) Utilizing machine learning models, with performance enhancement through hyper-parameter optimization via meta-learning, to estimate optimal mix designs reducing the need for extensive manual experimentation, lowering costs, minimizing environmental impact, and accelerating the processing of quarry sludge. Our idea promises to streamline the marble sludge reuse process by leveraging collective data and advanced machine learning, promoting sustainability and efficiency in the stonecutting sector.


What can LLM tell us about cities?

arXiv.org Artificial Intelligence

This study explores the capabilities of large language models (LLMs) in providing knowledge about cities and regions on a global scale. We employ two methods: directly querying the LLM for target variable values and extracting explicit and implicit features from the LLM correlated with the target variable. Our experiments reveal that LLMs embed a broad but varying degree of knowledge across global cities, with ML models trained on LLM-derived features consistently leading to improved predictive accuracy. Additionally, we observe that LLMs demonstrate a certain level of knowledge across global cities on all continents, but it is evident when they lack knowledge, as they tend to generate generic or random outputs for unfamiliar tasks. These findings suggest that LLMs can offer new opportunities for data-driven decision-making in the study of cities.


Probing the limitations of multimodal language models for chemistry and materials research

arXiv.org Artificial Intelligence

Recent advancements in artificial intelligence have sparked interest in scientific assistants that could support researchers across the full spectrum of scientific workflows, from literature review to experimental design and data analysis. A key capability for such systems is the ability to process and reason about scientific information in both visual and textual forms - from interpreting spectroscopic data to understanding laboratory setups. Here, we introduce MaCBench, a comprehensive benchmark for evaluating how vision-language models handle real-world chemistry and materials science tasks across three core aspects: data extraction, experimental understanding, and results interpretation. Through a systematic evaluation of leading models, we find that while these systems show promising capabilities in basic perception tasks - achieving near-perfect performance in equipment identification and standardized data extraction - they exhibit fundamental limitations in spatial reasoning, cross-modal information synthesis, and multi-step logical inference. Our insights have important implications beyond chemistry and materials science, suggesting that developing reliable multimodal AI scientific assistants may require advances in curating suitable training data and approaches to training those models.


Advancing Transformative Education: Generative AI as a Catalyst for Equity and Innovation

arXiv.org Artificial Intelligence

Generative AI is transforming education by enabling personalized learning, enhancing administrative efficiency, and fostering creative engagement. This paper explores the opportunities and challenges these tools bring to pedagogy, proposing actionable frameworks to address existing equity gaps. Ethical considerations such as algorithmic bias, data privacy, and AI role in human centric education are emphasized. The findings underscore the need for responsible AI integration that ensures accessibility, equity, and innovation in educational systems.


Federated Learning in Chemical Engineering: A Tutorial on a Framework for Privacy-Preserving Collaboration Across Distributed Data Sources

arXiv.org Artificial Intelligence

Federated Learning (FL) is a decentralized machine learning approach that has gained attention for its potential to enable collaborative model training across clients while protecting data privacy, making it an attractive solution for the chemical industry. This work aims to provide the chemical engineering community with an accessible introduction to the discipline. Supported by a hands-on tutorial and a comprehensive collection of examples, it explores the application of FL in tasks such as manufacturing optimization, multimodal data integration, and drug discovery while addressing the unique challenges of protecting proprietary information and managing distributed datasets. The tutorial was built using key frameworks such as $\texttt{Flower}$ and $\texttt{TensorFlow Federated}$ and was designed to provide chemical engineers with the right tools to adopt FL in their specific needs. We compare the performance of FL against centralized learning across three different datasets relevant to chemical engineering applications, demonstrating that FL will often maintain or improve classification performance, particularly for complex and heterogeneous data. We conclude with an outlook on the open challenges in federated learning to be tackled and current approaches designed to remediate and improve this framework.


Comparison of Tiny Machine Learning Techniques for Embedded Acoustic Emission Analysis

arXiv.org Artificial Intelligence

This paper compares machine learning approaches with different input data formats for the classification of acoustic emission (AE) signals. AE signals are a promising monitoring technique in many structural health monitoring applications. Machine learning has been demonstrated as an effective data analysis method, classifying different AE signals according to the damage mechanism they represent. These classifications can be performed based on the entire AE waveform or specific features that have been extracted from it. However, it is currently unknown which of these approaches is preferred. With the goal of model deployment on resource-constrained embedded Internet of Things (IoT) systems, this work evaluates and compares both approaches in terms of classification accuracy, memory requirement, processing time, and energy consumption. To accomplish this, features are extracted and carefully selected, neural network models are designed and optimized for each input data scenario, and the models are deployed on a low-power IoT node. The comparative analysis reveals that all models can achieve high classification accuracies of over 99\%, but that embedded feature extraction is computationally expensive. Consequently, models utilizing the raw AE signal as input have the fastest processing speed and thus the lowest energy consumption, which comes at the cost of a larger memory requirement.


Continuous Design and Reprogramming of Totimorphic Structures for Space Applications

arXiv.org Artificial Intelligence

Throughout nature, the intricate and disordered lattice structures that are observed in bones, plant stems, dragonfly wings, coral, radiolarians [1], amongst many other examples, demonstrate how powerful geometry is for designing structures with extreme mechanical properties from a very limited selection of base materials [2]. Metamaterials [3] are a recent example of human-engineered lattice structures that utilise the geometric design space of unit cells to change the properties of the lattice obtained by tiling this motive, often producing structures with different properties than those of the underlying lattice material - for instance, having a soft and compressible lattice made of a very brittle material such as ceramic [4]. In addition to metamaterials that follow a periodic design philosophy, there is a growing interest in (inversely) designing disordered lattice materials and structures [5-12], allowing us to fully tap into the functional design space explored by nature. Since lattices can be constructed using additive manufacturing, they combine ease of manufacturing with a highly expressive design space that only requires a small amount of building materials. It is not surprising that lattices have found applications on a variety of scales, ranging from nano-and mesoscale materials to large-scale structures such as space habitats [13-16]. The static nature of lattices also means that once they have been constructed, their properties are fixed - unless physically stimulating the lattice changes the properties of its base materials or allows switching between different shapes (e.g., magnetically [17-19]), therefore enabling a certain degree of reprogrammability of the lattice's properties; also known as active metamaterials [20, 21].


MolReFlect: Towards In-Context Fine-grained Alignments between Molecules and Texts

arXiv.org Artificial Intelligence

Molecule discovery is a pivotal research field, impacting everything from the medicines we take to the materials we use. Recently, Large Language Models (LLMs) have been widely adopted in molecule understanding and generation, yet the alignments between molecules and their corresponding captions remain a significant challenge. Previous endeavours often treat the molecule as a general SMILES string or molecular graph, neglecting the fine-grained alignments between the molecular sub-structures and the descriptive textual phrases, which are crucial for accurate and explainable predictions. In this case, we introduce MolReFlect, a novel teacher-student framework designed to contextually perform the molecule-caption alignments in a fine-grained way. Our approach initially leverages a larger teacher LLM to label the detailed alignments by directly extracting critical phrases from molecule captions or SMILES strings and implying them to corresponding sub-structures or characteristics. To refine these alignments, we propose In-Context Selective Reflection, which retrieves previous extraction results as context examples for teacher LLM to reflect and lets a smaller student LLM select from in-context reflection and previous extraction results. Finally, we enhance the learning process of the student LLM through Chain-of-Thought In-Context Molecule Tuning, integrating the fine-grained alignments and the reasoning processes within the Chain-of-Thought format. Our experimental results demonstrate that MolReFlect enables LLMs like Mistral-7B to significantly outperform the previous baselines, achieving SOTA performance on the ChEBI-20 dataset. This advancement not only enhances the generative capabilities of LLMs in the molecule-caption translation task, but also contributes to a more explainable framework.


Assessing data-driven predictions of band gap and electrical conductivity for transparent conducting materials

arXiv.org Artificial Intelligence

Machine Learning (ML) has offered innovative perspectives for accelerating the discovery of new functional materials, leveraging the increasing availability of material databases. Despite the promising advances, data-driven methods face constraints imposed by the quantity and quality of available data. Moreover, ML is often employed in tandem with simulated datasets originating from density functional theory (DFT), and assessed through in-sample evaluation schemes. This scenario raises questions about the practical utility of ML in uncovering new and significant material classes for industrial applications. Here, we propose a data-driven framework aimed at accelerating the discovery of new transparent conducting materials (TCMs), an important category of semiconductors with a wide range of applications. To mitigate the shortage of available data, we create and validate unique experimental databases, comprising several examples of existing TCMs. We assess state-of-the-art (SOTA) ML models for property prediction from the stoichiometry alone. We propose a bespoke evaluation scheme to provide empirical evidence on the ability of ML to uncover new, previously unseen materials of interest. We test our approach on a list of 55 compositions containing typical elements of known TCMs. Although our study indicates that ML tends to identify new TCMs compositionally similar to those in the training data, we empirically demonstrate that it can highlight material candidates that may have been previously overlooked, offering a systematic approach to identify materials that are likely to display TCMs characteristics.