Goto

Collaborating Authors

 Materials


What Were You Thinking? An LLM-Driven Large-Scale Study of Refactoring Motivations in Open-Source Projects

arXiv.org Artificial Intelligence

Context. Code refactoring improves software quality without changing external behavior. Despite its advantages, its benefits are hindered by the considerable cost of time, resources, and continuous effort it demands. Aim. Understanding why developers refactor, and which metrics capture these motivations, may support wider and more effective use of refactoring in practice. Method. We performed a large-scale empirical study to analyze developers refactoring activity, leveraging Large Language Models (LLMs) to identify underlying motivations from version control data, comparing our findings with previous motivations reported in the literature. Results. LLMs matched human judgment in 80% of cases, but aligned with literature-based motivations in only 47%. They enriched 22% of motivations with more detailed rationale, often highlighting readability, clarity, and structural improvements. Most motivations were pragmatic, focused on simplification and maintainability. While metrics related to developer experience and code readability ranked highest, their correlation with motivation categories was weak. Conclusions. We conclude that LLMs effectively capture surface-level motivations but struggle with architectural reasoning. Their value lies in providing localized explanations, which, when combined with software metrics, can form hybrid approaches. Such integration offers a promising path toward prioritizing refactoring more systematically and balancing short-term improvements with long-term architectural goals.


Anglo American, Teck Resources to merge in second-largest mining deal ever

Al Jazeera

London-listed miner Anglo American and Canada's Teck Resources plan to merge, marking the sector's second-biggest mergers and acquisitions deal ever and forging a new global copper-focused heavyweight. Under the proposed deal, which will require regulatory approvals and was announced on Tuesday, Anglo American shareholders will own 62.4 percent of the new company, Anglo Teck, while shareholders in Teck would hold 37.6 percent. The deal to form the world's fifth-largest copper company is also a big bet on copper by Anglo. Glencore's $90bn merger with Xstrata in 2013 remains the largest mining deal in history. Copper, used in the power and construction sectors, is set to benefit from burgeoning demand spurred by electric vehicles and artificial intelligence.


Tracking daily paths in home contexts with RSSI fingerprinting based on UWB through deep learning models

arXiv.org Artificial Intelligence

The field of human activity recognition has evolved significantly, driven largely by advancements in Internet of Things (IoT) device technology, particularly in personal devices. This study investigates the use of ultra-wideband (UWB) technology for tracking inhabitant paths in home environments using deep learning models. UWB technology estimates user locations via time-of-flight and time-difference-of-arrival methods, which are significantly affected by the presence of walls and obstacles in real environments, reducing their precision. To address these challenges, we propose a fingerprinting-based approach utilizing received signal strength indicator (RSSI) data collected from inhabitants in two flats (60 m2 and 100 m2) while performing daily activities. We compare the performance of convolutional neural network (CNN), long short-term memory (LSTM), and hybrid CNN+LSTM models, as well as the use of Bluetooth technology. Additionally, we evaluate the impact of the type and duration of the temporal window (future, past, or a combination of both). Our results demonstrate a mean absolute error close to 50 cm, highlighting the superiority of the hybrid model in providing accurate location estimates, thus facilitating its application in daily human activity recognition in residential settings.


Machine learning magnetism from simple global descriptors

arXiv.org Artificial Intelligence

The reliable identification of magnetic ground states remains a major challenge in high-throughput materials databases, where density functional theory (DFT) workflows often converge to ferromagnetic (FM) solutions. Here, we partially address this challenge by developing machine learning classifiers trained on experimentally validated MAGNDATA magnetic materials leveraging a limited number of simple compositional, structural, and electronic descriptors sourced from the Materials Project database. Our propagation vector classifiers achieve accuracies above 92%, outperforming recent studies in reliably distinguishing zero from nonzero propagation vector structures, and exposing a systematic ferromagnetic bias inherent to the Materials Project database for more than 7,843 materials. In parallel, LightGBM and XGBoost models trained directly on the Materials Project labels achieve accuracies of 84-86% (with macro F1 average scores of 63-66%), which proves useful for large-scale screening for magnetic classes, if refined by MAGNDATA-trained classifiers. These results underscore the role of machine learning techniques as corrective and exploratory tools, enabling more trustworthy databases and accelerating progress toward the identification of materials with various properties.


SasAgent: Multi-Agent AI System for Small-Angle Scattering Data Analysis

arXiv.org Artificial Intelligence

We introduce SasAgent, a multi-agent AI system powered by large language models (LLMs) that automates small-angle scattering (SAS) data analysis by leveraging tools from the SasView software and enables user interaction via text input. SasAgent features a coordinator agent that interprets user prompts and delegates tasks to three specialized agents for scattering length density (SLD) calculation, synthetic data generation, and experimental data fitting. These agents utilize LLM-friendly tools to execute tasks efficiently. These tools, including the model data tool, Retrieval-Augmented Generation (RAG) documentation tool, bump fitting tool, and SLD calculator tool, are derived from the SasView Python library. A user-friendly Gradio-based interface enhances user accessibility. Through diverse examples, we demonstrate SasAgent's ability to interpret complex prompts, calculate SLDs, generate accurate scattering data, and fit experimental datasets with high precision. This work showcases the potential of LLM-driven AI systems to streamline scientific workflows and enhance automation in SAS research.


Real-Time Analysis of Unstructured Data with Machine Learning on Heterogeneous Architectures

arXiv.org Artificial Intelligence

As the particle physics community needs higher and higher precisions in order to test our current model of the subatomic world, larger and larger datasets are necessary. With upgrades scheduled for the detectors of colliding-beam experiments around the world, and specifically at the Large Hadron Collider at CERN, more collisions and more complex interactions are expected. This directly implies an increase in data produced and consequently in the computational resources needed to process them. At CERN, the amount of data produced is gargantuan. This is why the data have to be heavily filtered and selected in real time before being permanently stored. This data can then be used to perform physics analyses, in order to expand our current understanding of the universe and improve the Standard Model of physics. This real-time filtering, known as triggering, involves complex processing happening often at frequencies as high as 40 MHz. This thesis contributes to understanding how machine learning models can be efficiently deployed in such environments, in order to maximize throughput and minimize energy consumption. Inevitably, modern hardware designed for such tasks and contemporary algorithms are needed in order to meet the challenges posed by the stringent, high-frequency data rates. In this work, I present our graph neural network-based pipeline, developed for charged particle track reconstruction at the LHCb experiment at CERN. The pipeline was implemented end-to-end inside LHCb's first-level trigger, entirely on GPUs. Its performance was compared against the classical tracking algorithms currently in production at LHCb. The pipeline was also accelerated on the FPGA architecture, and its performance in terms of power consumption and processing speed was compared against the GPU implementation.


Constraint Programming Models For Serial Batch Scheduling With Minimum Batch Size

arXiv.org Artificial Intelligence

In serial batch (s-batch) scheduling, jobs are grouped in batches and processed sequentially within their batch. This paper considers multiple parallel machines, nonidentical job weights and release times, and sequence-dependent setup times between batches of different families. Although s-batch has been widely studied in the literature, very few papers have taken into account a minimum batch size, typical in practical settings such as semiconductor manufacturing and the metal industry. The problem with this minimum batch size requirement has been mostly tackled with dynamic programming and meta-heuristics, and no article has ever used constraint programming (CP) to do so. This paper fills this gap by proposing, three CP models for s-batching with minimum batch size: (i) an Interval Assignment model that computes and bounds the size of the batches using the presence literals of interval variables of the jobs. The computational experiments on standard cases compare the three CP models with two existing mixed-integer programming (MIP) models from the literature. The results demonstrate the versatility of the proposed CP models to handle multiple variations of s-batching; and their ability to produce, in large instances, better solutions than the MIP models faster. Introduction In the current and highly competitive landscape of the manufacturing industry, companies are under growing pressure to minimize production costs and reduce cycle times. One effective strategy to improve efficiency is to process similar tasks, called jobs, together in groups known as batches [1]. There are two main ways to process these batches. In parallel batching (p-batch), all jobs in a batch are processed simultaneously [2]. In contrast, in serial batching (s-batch), jobs in a batch are processed sequentially one after another [3]. The benefits of p-batching are obvious since it saves time by processing multiple jobs at once. Similarly, s-batching is especially useful when grouping similar jobs can prevent repetitive machine setups, which are time-consuming and costly [4]. Serial batching appears in many industries, including metal processing [5], additive manufacturing (3D printing) [5, 6], paint [7] and pharmaceutical production [8], chemical manufacturing [9], and semiconductor manufacturing [10, 11].


Meet the Ethiopian entrepreneur who is reinventing ammonia production

MIT Technology Review

After growing up without reliable power at home, Iwnetim Abate is working to develop a steady supply of sustainable energy. "I'm the only one who wears glasses and has eye problems in the family," Iwnetim Abate says with a smile as sun streams in through the windows of his MIT office. "I think it's because of the candles." In the small town in Ethiopia where he grew up, Abate's family had electricity, but it was unreliable. So, for several days each week when they were without power, Abate would finish his homework by candlelight. Today, Abate, 32, is an assistant professor at MIT in the department of materials science and engineering.


Odometry Calibration and Pose Estimation of a 4WIS4WID Mobile Wall Climbing Robot

arXiv.org Artificial Intelligence

--This paper presents the design of a pose estimator for a four wheel independent steer four wheel independent drive (4WIS4WID) wall climbing mobile robot, based on the fusion of multimodal measurements, including wheel odometry, visual odometry, and an inertial measurement unit (IMU) data using Extended Kalman Filter (EKF) and Unscented Kalman Filter (UKF). The pose estimator is a critical component of wall climbing mobile robots, as their operational environment involves carrying precise measurement equipment and maintenance tools in construction, requiring information about pose on the building at the time of measurement. Due to the complex geometry and material properties of building fac ades, the use of traditional localization sensors such as laser, ultrasonic, or radar is often infeasible for wall-climbing robots. Moreover, GPS-based localization is generally unreliable in these environments because of signal degradation caused by reinforced concrete and electromagnetic interference. Consequently, robot odometry remains the primary source of velocity and position information, despite being susceptible to drift caused by both systematic and non-systematic errors. The calibrations of the robot's systematic parameters were conducted using nonlinear optimization and Levenberg-Marquardt methods as Newton-Gauss and gradient-based model fitting methods, while Genetic algorithm and Particle swarm were used as stochastic based methods for kinematic parameter calibration. Performance and results of the calibration methods and pose estimators were validated in detail with experiments on the experimental mobile wall climbing robot.


Revealing the empirical flexibility of gas units through deep clustering

arXiv.org Artificial Intelligence

The flexibility of a power generation unit determines how quickly and often it can ramp up or down. In energy models, it depends on assumptions on the technical characteristics of the unit, such as its installed capacity or turbine technology. In this paper, we learn the empirical flexibility of gas units from their electricity generation, revealing how real-world limitations can lead to substantial differences between units with similar technical characteristics. Using a novel deep clustering approach, we transform 5 years (2019-2023) of unit-level hourly generation data for 49 German units from 100 MWp of installed capacity into low-dimensional embeddings. Our unsupervised approach identifies two clusters of peaker units (high flexibility) and two clusters of non-peaker units (low flexibility). The estimated ramp rates of non-peakers, which constitute half of the sample, display a low empirical flexibility, comparable to coal units. Non-peakers, predominantly owned by industry and municipal utilities, show limited response to low residual load and negative prices, generating on average 1.3 GWh during those hours. As the transition to renewables increases market variability, regulatory changes will be needed to unlock this flexibility potential.