Materials
Identification of Empirical Constitutive Models for Age-Hardenable Aluminium Alloy and High-Chromium Martensitic Steel Using Symbolic Regression
Kabliman, Evgeniya, Kronberger, Gabriel
Process-structure-property relationships are fundamental in materials science and engineering and are key to the development of new and improved materials. Symbolic regression serves as a powerful tool for uncovering mathematical models that describe these relationships. It can automatically generate equations to predict material behaviour under specific manufacturing conditions and optimize performance characteristics such as strength and elasticity. The present work illustrates how symbolic regression can derive constitutive models that describe the behaviour of various metallic alloys during plastic deformation. Constitutive modelling is a mathematical framework for understanding the relationship between stress and strain in materials under different loading conditions. In this study, two materials (age-hardenable aluminium alloy and high-chromium martensitic steel) and two different testing methods (compression and tension) are considered to obtain the required stress-strain data. The results highlight the benefits of using symbolic regression while also discussing potential challenges.
Revealing the Hidden Third Dimension of Point Defects in Two-Dimensional MXenes
Guinan, Grace, Smeaton, Michelle A., Wyatt, Brian C., Goldy, Steven, Egan, Hilary, Glaws, Andrew, Tucker, Garritt J., Anasori, Babak, Spurgeon, Steven R.
Point defects govern many important functional properties of two - dimensional ( 2D) materials. However, resolving the three - dimensional (3D) arrangement of these defects in multi - layer 2D materials remains a fundamental challenge, hindering rational defect engineering . Our approach reconstructs the 3D coordinates of vacancies across hundreds of thousands of lattice sites, generating robust statistical insight into their dist ribution that can be correlated with specinullic synthesis pathways. This large - scale data enables us to classify a hierarchy of defect structures -- from isolated vacancies to nanopores -- revealing their preferred formation and interaction mechanisms, as corroborated by molecular dynamics simulations . This work provides a generalizable framework for understanding and ultimately controlling point defects across large volumes, paving the way for the rational design of defect - engineered functional 2D materials. Keywords: 2D materials, point defects, autonomous materials science, electron microscopy, machine learning 2 Two - dimensional (2D) materials have become a major nullield of modern research in materials science after the discovery of graphene in 2004 . The challenge of characterizing point defects is signinullicantly amplinullied in few - layered 2D materials. For instance, MXenes -- a class of 2D transition metal carbides, carbonitrides, and nitrides -- consist of nanosheets containing two to nullive layers of metal ato ms, which complicates defect analysis compared to single - layer materials .
Bi-Objective Evolutionary Optimization for Large-Scale Open Pit Mine Scheduling Problem under Uncertainty with Chance Constraints
Pathiranage, Ishara Hewa, Neumann, Aneta
The open-pit mine scheduling problem (OPMSP) is a complex, computationally expensive process in long-term mine planning, constrained by operational and geological dependencies. Traditional deterministic approaches often ignore geological uncertainty, leading to suboptimal and potentially infeasible production schedules. Chance constraints allow modeling of stochastic components by ensuring probabilistic constraints are satisfied with high probability. This paper presents a bi-objective formulation of the OPMSP that simultaneously maximizes expected net present value and minimizes scheduling risk, independent of the confidence level required for the constraint. Solutions are represented using integer encoding, inherently satisfying reserve constraints. We introduce a domain-specific greedy randomized initialization and a precedence-aware period-swap mutation operator. We integrate these operators into three multi-objective evolutionary algorithms: the global simple evolutionary multi-objective optimizer (GSEMO), a mutation-only variant of multi-objective evolutionary algorithm based on decomposition (MOEA/D), and non-dominated sorting genetic algorithm II (NSGA-II). We compare our bi-objective formulation against the single-objective approach, which depends on a specific confidence level, by analyzing mine deposits consisting of up to 112 687 blocks. Results demonstrate that the proposed bi-objective formulation yields more robust and balanced trade-offs between economic value and risk compared to single-objective, confidence-dependent approach.
Hardware-Aware YOLO Compression for Low-Power Edge AI on STM32U5 for Weeds Detection in Digital Agriculture
Kouzinopoulos, Charalampos S., Manna, Yuri
Abstract--Weeds significantly reduce crop yields worldwide and pose major challenges to sustainable agriculture. Traditional weed management methods, primarily relying on chemical herbicides, risk environmental contamination and lead to the emergence of herbicide-resistant species. Precision weeding, leveraging computer vision and machine learning methods, offers a promising eco-friendly alternative but is often limited by reliance on high-power computational platforms. This work presents an optimized, low-power edge AI system for weeds detection based on the YOLOv8n object detector deployed on the STM32U575ZI microcontroller . Several compression techniques are applied to the detection model, including structured pruning, integer quantization and input image resolution scaling in order to meet strict hardware constraints. The model is trained and evaluated on the CropAndWeed dataset with 74 plant species, achieving a balanced trade-off between detection accuracy and efficiency. Our system supports real-time, in-situ weeds detection with a minimal energy consumption of 51.8mJ per inference, enabling scalable deployment in power-constrained agricultural environments. EEDS are widespread and persistent plants, known for their rapid reproduction and effective seed dispersal strategies. They are among the primary contributors to crop yield loss globally, posing a significant challenge for farmers and agricultural stakeholders [1].
Coupling Agent-based Modeling and Life Cycle Assessment to Analyze Trade-offs in Resilient Energy Transitions
Zhang, Beichen, Zaki, Mohammed T., Breunig, Hanna, Ajami, Newsha K.
Transitioning to sustainable and resilient energy systems requires navigating complex and interdependent trade-offs across environmental, social, and resource dimensions. Neglecting these trade-offs can lead to unintended consequences across sectors. However, existing assessments often evaluate emerging energy pathways and their impacts in silos, overlooking critical interactions such as regional resource competition and cumulative impacts. We present an integrated modeling framework that couples agent-based modeling and Life Cycle Assessment (LCA) to simulate how energy transition pathways interact with regional resource competition, ecological constraints, and community-level burdens. We apply the model to a case study in Southern California. The results demonstrate how integrated and multiscale decision making can shape energy pathway deployment and reveal spatially explicit trade-offs under scenario-driven constraints. This modeling framework can further support more adaptive and resilient energy transition planning on spatial and institutional scales.
MCP4IFC: IFC-Based Building Design Using Large Language Models
Nithyanantham, Bharathi Kannan, Sesterhenn, Tobias, Nedungadi, Ashwin, Garijo, Sergio Peral, Zenkner, Janis, Bartelt, Christian, Lรผdtke, Stefan
Bringing generative AI into the architecture, engineering and construction (AEC) field requires systems that can translate natural language instructions into actions on standardized data models. We present MCP4IFC, a comprehensive open-source framework that enables Large Language Models (LLMs) to directly manipulate Industry Foundation Classes (IFC) data through the Model Context Protocol (MCP). The framework provides a set of BIM tools, including scene querying tools for information retrieval, predefined functions for creating and modifying common building elements, and a dynamic code-generation system that combines in-context learning with retrieval-augmented generation (RAG) to handle tasks beyond the predefined toolset. Experiments demonstrate that an LLM using our framework can successfully perform complex tasks, from building a simple house to querying and editing existing IFC data. Our framework is released as open-source to encourage research in LLM-driven BIM design and provide a foundation for AI-assisted modeling workflows. Our code is available at https://show2instruct.github.io/mcp4ifc/.
ChemBOMAS: Accelerated BO in Chemistry with LLM-Enhanced Multi-Agent System
Han, Dong, Ai, Zhehong, Cai, Pengxiang, Lu, Shanya, Chen, Jianpeng, Ye, Zihao, Sun, Shuzhou, Gao, Ben, Ge, Lingli, Wang, Weida, Zhou, Xiangxin, Liu, Xihui, Su, Mao, Ouyang, Wanli, Bai, Lei, Zhou, Dongzhan, Xu, Tao, Li, Yuqiang, Zhang, Shufei
Bayesian optimization (BO) is a powerful tool for scientific discovery in chemistry, yet its efficiency is often hampered by the sparse experimental data and vast search space. Here, we introduce ChemBOMAS: a large language model (LLM)-enhanced multi-agent system that accelerates BO through synergistic data- and knowledge-driven strategies. Firstly, the data-driven strategy involves an 8B-scale LLM regressor fine-tuned on a mere 1% labeled samples for pseudo-data generation, robustly initializing the optimization process. Secondly, the knowledge-driven strategy employs a hybrid Retrieval-Augmented Generation approach to guide LLM in dividing the search space while mitigating LLM hallucinations. An Upper Confidence Bound algorithm then identifies high-potential subspaces within this established partition. Across the LLM-refined subspaces and supported by LLM-generated data, BO achieves the improvement of effectiveness and efficiency. Comprehensive evaluations across multiple scientific benchmarks demonstrate that ChemBOMAS set a new state-of-the-art, accelerating optimization efficiency by up to 5-fold compared to baseline methods.
Rethinking Tokenization for Rich Morphology: The Dominance of Unigram over BPE and Morphological Alignment
Vemula, Saketh Reddy, Dandapat, Sandipan, Sharma, Dipti Misra, Krishnamurthy, Parameswari
The relationship between tokenizer algorithm (e.g., Byte-Pair Encoding (BPE), Unigram), morphological alignment, tokenization quality (e.g., compression efficiency), and downstream performance remains largely unclear, particularly for languages with complex morphology. In this paper, we conduct a comprehensive evaluation of tokenizers using small-sized BERT models -- from pre-training through fine-tuning -- for Telugu (agglutinative), along with preliminary evaluation in Hindi (primarily fusional with some agglutination) and English (fusional). To evaluate morphological alignment of tokenizers in Telugu, we create a dataset containing gold morpheme segmentations of 600 derivational and 7000 inflectional word forms. Our experiments reveal two key findings for Telugu. First, the choice of tokenizer algorithm is the most significant factor influencing performance, with Unigram-based tokenizers consistently outperforming BPE across most settings. Second, while better morphological alignment shows a moderate, positive correlation with performance on text classification and structure prediction tasks, its impact is secondary to the tokenizer algorithm. Notably, hybrid approaches that use morphological information for pre-segmentation significantly boost the performance of BPE, though not Unigram. Our results further showcase the need for comprehensive intrinsic evaluation metrics for tokenizers that could explain downstream performance trends consistently.
Compressing Chemistry Reveals Functional Groups
We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle from computational learning theory: a good explanation of data should also compress the data. We introduce an unsupervised learning algorithm based on the Minimum Message Length (MML) principle that searches for substructures that compress around three million biologically relevant molecules. We demonstrate that the discovered substructures contain most human-curated functional groups as well as novel larger patterns with more specific functions. We also run our algorithm on 24 specific bioactivity prediction datasets to discover dataset-specific functional groups. Fingerprints constructed from dataset-specific functional groups are shown to significantly outperform other fingerprint representations, including the MACCS and Morgan fingerprint, when training ridge regression models on bioactivity regression tasks.
Temporal Sparse Autoencoders: Leveraging the Sequential Nature of Language for Interpretability
Bhalla, Usha, Oesterling, Alex, Verdun, Claudio Mayrink, Lakkaraju, Himabindu, Calmon, Flavio P.
Translating the internal representations and computations of models into concepts that humans can understand is a key goal of interpretability. While recent dictionary learning methods such as Sparse Autoencoders (SAEs) provide a promising route to discover human-interpretable features, they suffer from a variety of problems, including a systematic failure to capture the rich conceptual information that drives linguistic understanding. Instead, they exhibit a bias towards shallow, token-specific, or noisy features, such as "the phrase 'The' at the start of sentences". In this work, we propose that this is due to a fundamental issue with how dictionary learning methods for LLMs are trained. Language itself has a rich, well-studied structure spanning syntax, semantics, and pragmatics; however, current unsupervised methods largely ignore this linguistic knowledge, leading to poor feature discovery that favors superficial patterns over meaningful concepts. We focus on a simple but important aspect of language: semantic content has long-range dependencies and tends to be smooth over a sequence, whereas syntactic information is much more local. Building on this insight, we introduce Temporal Sparse Autoencoders (T-SAEs), which incorporate a novel contrastive loss encouraging consistent activations of high-level features over adjacent tokens. This simple yet powerful modification enables SAEs to disentangle semantic from syntactic features in a self-supervised manner. Across multiple datasets and models, T-SAEs recover smoother, more coherent semantic concepts without sacrificing reconstruction quality. Strikingly, they exhibit clear semantic structure despite being trained without explicit semantic signal, offering a new pathway for unsupervised interpretability in language models.