Materials
The Download: testing new AI agent Manus, and Waabi's virtual robotruck ambitions
For many years, researchers have been working to build devices that can mimic photosynthesis--the process by which plants use sunlight and carbon dioxide to make their fuel. These artificial leaves use sunlight to separate water into oxygen and hydrogen, which could then be used to fuel cars or generate electricity. Now a research team from the University of Cambridge has taken aim at creating more energy-dense fuels. The group's device produces ethylene and ethane, proving that artificial leaves can create hydrocarbons. The development could offer a cheaper, cleaner way to make fuels, chemicals, and plastics--with the ultimate goal of creating fuels that don't leave a harmful carbon footprint after they're burned.
A journey through the hyper-political world of microchips
A small town in the Netherlands hosts the only factory that produces the only chip-making machines that generate a type of light found nowhere naturally on Earth: extreme ultraviolet, a light emitted by young stars in outer space. This light, known as EUV, is the only way to make one of the world's most valuable and important technologies at scale: cutting-edge semiconductor chips. The factory is forbidden from selling its EUV machines to China. Below we explain how the chips are made, why they have become the focus of the US-China trade wars, how Taiwan was drawn into the maelstrom, and what could come next. The answers take us from deep underground to outer space, from the dirtiest places in the world to the cleanest, from the hottest temperatures to the coldest, from man-made structures smaller than a virus to equipment so large it takes three planes to move, and finally, to a state in physics that is two opposites at the same time.
The 5 best mechanical keyboards for 2025
Your keyboard is one of the few pieces of technology you'll use for hours at a time, so why not make it something that brings you joy? Sure, the people who gush over mechanical keyboards can be a bit much, but the enhanced comfort, durability and customizability that comes with the best of them is real. If you're interested in making the switch (ahem), we've tested dozens of mechanical keyboards over the past year and rounded up our favorites below. We've also broken down what to look for as you shop. The first thing to decide with any keyboard is what size and layout you want. Full-size layouts have all the keys you'd ever need -- a number pad, a full function row, arrow keys, etc. -- but they also have the largest physical footprint. A 96-percent or "1800" keyboard is similar, but crunches the navigation cluster (Page Up, Home, etc.), numpad and arrow keys closer together to save space. Tenkeyless (TKL) or 80-percent keyboards omit the number pad entirely; they're often considered the best blend of size and functionality. It gets more and more minimal from there. The smallest popular layout is the 60 percent keyboard, which removes the arrow keys, function row, numpad and navigation cluster. This kind of design can be particularly useful for gaming, as it opens up a ton of desk space to swing your mouse around. It typically relies on shortcuts to make up for its missing keys, but it comes with a learning curve as a result. Even more compact options exist beyond that. These can be adorable, but they usually involve removing the number row, which is a step too far for most people.
Artificial Intelligence as Catalyst for Biodiversity Understanding
Artificial intelligence (AI) is not a panacea for effortlessly solving the planet's environmental problems. AI still sparks passionate and dystopian predictions within some parts of the academic community, especially in the natural sciences. For some, the existence of AI tools means an existential threat to human creativity.10 Concerns about the increasing environmental costs of carbon emissions1 and water use demanded by information and communication technologies are also on the horizon. These viewpoints, however, overlook the advantages of employing AI in biodiversity research.
Electron flow matching for generative reaction mechanism prediction obeying conservation laws
Joung, Joonyoung F., Fong, Mun Hong, Casetti, Nicholas, Liles, Jordan P., Dassanayake, Ne S., Coley, Connor W.
Mass conservation is a fundamental principle in chemistry, servicing as a critical constraint for accurately modeling chemical reactions. Postulated by Antoine Lavoisier in the eighteenth century, it asserts that the total mass of reactants equals the total mass of products, forming the basis for stoichiometry and chemical equation balancing. Despite its simplicity and essentiality, many machine learning models trained on chemical reaction data do not inherently enforce mass conservation. In this work, we introduce a new modeling formulation for reaction outcome prediction that achieves exact conservation by modeling chemical reactivity as a generative and probabilistic process of electron redistribution. The task of reaction outcome prediction has become a popular target for supervised machine learning [1, 2]. While chemists typically conceptualize, visualize, and communicate understanding of chemical reactions through mechanistic arrow-pushing diagrams, most data-driven models bypass this formalism and focus solely on predicting the major product in an end-to-end manner.
Perovskite-LLM: Knowledge-Enhanced Large Language Models for Perovskite Solar Cell Research
Liu, Xiang, Sun, Penglei, Chen, Shuyan, Zhang, Longhan, Dong, Peijie, You, Huajie, Zhang, Yongqi, Yan, Chang, Chu, Xiaowen, Zhang, Tong-yi
The rapid advancement of perovskite solar cells (PSCs) has led to an exponential growth in research publications, creating an urgent need for efficient knowledge management and reasoning systems in this domain. We present a comprehensive knowledge-enhanced system for PSCs that integrates three key components. First, we develop Perovskite-KG, a domain-specific knowledge graph constructed from 1,517 research papers, containing 23,789 entities and 22,272 relationships. Second, we create two complementary datasets: Perovskite-Chat, comprising 55,101 high-quality question-answer pairs generated through a novel multi-agent framework, and Perovskite-Reasoning, containing 2,217 carefully curated materials science problems. Third, we introduce two specialized large language models: Perovskite-Chat-LLM for domain-specific knowledge assistance and Perovskite-Reasoning-LLM for scientific reasoning tasks. Experimental results demonstrate that our system significantly outperforms existing models in both domain-specific knowledge retrieval and scientific reasoning tasks, providing researchers with effective tools for literature review, experimental design, and complex problem-solving in PSC research.
Agentic Deep Graph Reasoning Yields Self-Organizing Knowledge Networks
We present an agentic, autonomous graph expansion framework that iteratively structures and refines knowledge in situ. Unlike conventional knowledge graph construction methods relying on static extraction or single-pass learning, our approach couples a reasoning-native large language model with a continually updated graph representation. At each step, the system actively generates new concepts and relationships, merges them into a global graph, and formulates subsequent prompts based on its evolving structure. Through this feedback-driven loop, the model organizes information into a scale-free network characterized by hub formation, stable modularity, and bridging nodes that link disparate knowledge clusters. Over hundreds of iterations, new nodes and edges continue to appear without saturating, while centrality measures and shortest path distributions evolve to yield increasingly distributed connectivity. Our analysis reveals emergent patterns, such as the rise of highly connected 'hub' concepts and the shifting influence of 'bridge' nodes, indicating that agentic, self-reinforcing graph construction can yield open-ended, coherent knowledge structures. Applied to materials design problems, we present compositional reasoning experiments by extracting node-specific and synergy-level principles to foster genuinely novel knowledge synthesis, yielding cross-domain ideas that transcend rote summarization and strengthen the framework's potential for open-ended scientific discovery. We discuss other applications in scientific discovery and outline future directions for enhancing scalability and interpretability.
Unsupervised Anomaly Detection through Mass Repulsing Optimal Transport
Montesuma, Eduardo Fernandes, Habazi, Adel El, Mboula, Fred Ngole
An anomaly, or an outlier, is a data point that is significantly different from the remaining data [Aggarwal, 2017], to such an extent that it was likely generated by a different mechanism [Hawkins, 1980]. From the perspective of machine learning, Anomaly Detection (AD) wants to determine, from a set of examples, which ones are likely anomalies, typically through a score. This problem finds applications in many different fields, such as medicine Salem et al. [2013], cyber-security Siddiqui et al. [2019], and system monitoring Isermann [2006], to name a few. As reviewed in Han et al. [2022], existing techniques for AD are usually divided into unsupervised, semi-supervised and supervised approaches, with an increasing need for labeled data. In this paper, we focus on unsupervised AD, which does not need further labeling effort in constituting datasets. As discussed in Livernoche et al. [2024], the growing number of applications involving high-dimensional and complex data begs the need for non-parametric algorithms.
LaM-SLidE: Latent Space Modeling of Spatial Dynamical Systems via Linked Entities
Sestak, Florian, Toshev, Artur, Fรผrst, Andreas, Klambauer, Gรผnter, Mayr, Andreas, Brandstetter, Johannes
Generative models are spearheading recent progress in deep learning, showing strong promise for trajectory sampling in dynamical systems as well. However, while latent space modeling paradigms have transformed image and video generation, similar approaches are more difficult for most dynamical systems. Such systems -- from chemical molecule structures to collective human behavior -- are described by interactions of entities, making them inherently linked to connectivity patterns and the traceability of entities over time. Our approach, LaM-SLidE (Latent Space Modeling of Spatial Dynamical Systems via Linked Entities), combines the advantages of graph neural networks, i.e., the traceability of entities across time-steps, with the efficiency and scalability of recent advances in image and video generation, where pre-trained encoder and decoder are frozen to enable generative modeling in the latent space. The core idea of LaM-SLidE is to introduce identifier representations (IDs) to allow for retrieval of entity properties, e.g., entity coordinates, from latent system representations and thus enables traceability. Experimentally, across different domains, we show that LaM-SLidE performs favorably in terms of speed, accuracy, and generalizability. (Code is available at https://github.com/ml-jku/LaM-SLidE)
RIDE: Enhancing Large Language Model Alignment through Restyled In-Context Learning Demonstration Exemplars
Hua, Yuncheng, Qu, Lizhen, Li, Zhuang, Xue, Hao, Salim, Flora D., Haffari, Gholamreza
Alignment tuning is crucial for ensuring large language models (LLMs) behave ethically and helpfully. Current alignment approaches require high-quality annotations and significant training resources. This paper proposes a low-cost, tuning-free method using in-context learning (ICL) to enhance LLM alignment. Through an analysis of high-quality ICL demos, we identified style as a key factor influencing LLM alignment capabilities and explicitly restyled ICL exemplars based on this stylistic framework. Additionally, we combined the restyled demos to achieve a balance between the two conflicting aspects of LLM alignment--factuality and safety. We packaged the restyled examples as prompts to trigger few-shot learning, improving LLM alignment. Compared to the best baseline approach, with an average score of 5.00 as the maximum, our method achieves a maximum 0.10 increase on the Alpaca task (from 4.50 to 4.60), a 0.22 enhancement on the Just-eval benchmark (from 4.34 to 4.56), and a maximum improvement of 0.32 (from 3.53 to 3.85) on the MT-Bench dataset. We release the code and data at https://github.com/AnonymousCode-ComputerScience/RIDE.