chemistry
Bridging the Gap Between Cross-Domain Theory and Practical Application: ACase Study on Molecular Dissolution
Artificial intelligence (AI) has played a transformative role in chemical research, greatly facilitating the prediction of small molecule properties, simulation of catalytic processes, and material design. These advances are driven by increases in computing power, open source machine learning frameworks, and extensive chemical datasets. However, a persistent challenge is the limited amount of high-quality real-world data, while models calculated based on large amounts of theoretical data are often costly and difficult to deploy, which hinders the applicability of AI models in practical scenarios. In this study, we enhance the prediction of solutesolvent properties by proposing a novel sample selection method: Core Subset Iterative Extraction (CSIE). CSIE iteratively updates the core sample subset based on information gain to remove redundant samples in theoretical data and optimize the performance of the model on real chemical datasets. Furthermore, we introduce an asymmetric molecular interaction graph neural network (AMGNN) that combines positional information and bidirectional edge connections to simulate real-world chemical reaction scenarios to better capture solute-solvent interactions. Experimental results show that our method can accurately extract the core subset and improve the prediction accuracy. Code is available at: https://CISE-AMGNN.
ChemX: ACollection of Chemistry Datasets for Benchmarking Automated Information Extraction
Despite recent advances in machine learning, many scientific discoveries in chemistry still rely on manually curated datasets extracted from the scientific literature. Automation of information extraction in specialized chemistry domains has the potential to scale up machine learning applications and improve the quality of predictions, enabling data-driven scientific discoveries at a faster pace. In this paper, we present ChemX, a collection of 10 benchmarking datasets across several domains of chemistry providing a reliable basis for evaluating and fine-tuning automated information extraction methods. The datasets encompassing various properties of small molecules and nanomaterials have been manually extracted from peer-reviewed publications and systematically validated by domain experts through a cross-verification procedure allowing for identification and correction of errors at sources. In order to demonstrate the utility of the resulting datasets, we evaluate the extraction performance of the state-of-the-art large language models (LLMs). Moreover, we design our own agentic approach to take full control of the document preprocessing before LLM-based information extraction.
The Catechol Benchmark: Time-series Solvent Selection Data for Few-shot Machine Learning
Machine learning has promised to change the landscape of laboratory chemistry, with impressive results in molecular property prediction and reaction retro-synthesis. However, chemical datasets are often inaccessible to the machine learning community as they tend to require cleaning, thorough understanding of the chemistry, or are simply not available. In this paper, we introduce a novel dataset for yield prediction, providing the first-ever transient flow dataset for machine learning benchmarking, covering over 1200 process conditions. While previous datasets focus on discrete parameters, our experimental set-up allow us to sample a large number of continuous process conditions, generating new challenges for machine learning models. We focus on solvent selection, a task that is particularly difficult to model theoretically and therefore ripe for machine learning applications.
ChemX: A Collection of Chemistry Datasets for Benchmarking Automated Information Extraction
Despite recent advances in machine learning, many scientific discoveries in chemistry still rely on manually curated datasets extracted from the scientific literature. Automation of information extraction in specialized chemistry domains has the potential to scale up machine learning applications and improve the quality of predictions, enabling data-driven scientific discoveries at a faster pace. In this paper, we present ChemX, a collection of 10 benchmarking datasets across several domains of chemistry providing a reliable basis for evaluating and fine-tuning automated information extraction methods. The datasets encompassing various properties of small molecules and nanomaterials have been manually extracted from peer-reviewed publications and systematically validated by domain experts through a cross-verification procedure allowing for identification and correction of errors at sources. In order to demonstrate the utility of the resulting datasets, we evaluate the extraction performance of the state-of-the-art large language models (LLMs). Moreover, we design our own agentic approach to take full control of the document preprocessing before LLM-based information extraction.
The race to solve the biggest problem in quantum computing
The errors that quantum computers make are holding the technology back. Quantum computers won't be truly useful until they can correct their mistakes Quantum computers are already here, but they make far too many errors. This is arguably the biggest obstacle to the technology really becoming useful, but recent breakthroughs suggest a solution may be on the horizon. Errors creep into traditional computers too, but there are well-established techniques for correcting them. They rely on redundancy, where extra bits are used to detect when 0s incorrectly swap to 1s or vice versa.
The science of soulmates: Is there someone out there exactly right for you?
The science of soulmates: Is there someone out there exactly right for you? On Valentine's Day, there's the temptation to believe that somewhere out there is The One: a soulmate, a perfect match, the person you were meant to be with. Across history, humans have always been drawn to the idea that love isn't random. In ancient Greece, Plato imagined that we were once whole beings with four arms, four legs and two faces, so radiant that Zeus split us in two; ever since, each half has roamed the earth searching for its missing other, a myth that gives the modern soulmate its poetic pedigree and the promise that somewhere, someone will finally make us feel complete. In the Middle Ages, troubadours and Arthurian tales recast that longing as courtly love, a fierce, often forbidden devotion like Lancelot's for Guinevere, in which a knight proved his worth through self-sacrifice for a beloved he might never openly declare.
Nobel prizewinner Omar Yaghi says his invention will change the world
Chemist Omar Yaghi invented materials called MOFs, a few grams of which have the surface area of a football field. In school, we learn about the Stone Age, the Bronze Age - and we are currently in a silicon age characterised by computers and phones. What might define the next age? Omar Yaghi at the University of California, Berkeley, thinks a family of materials he helped pioneer in the 1990s has a good shot. They are metal-organic frameworks (MOFs), and working out how to make them earned him a share of the 2025 Nobel prize in chemistry .
Could 2026 be the year we start using quantum computers for chemistry?
Could 2026 be the year we start using quantum computers for chemistry? Whether quantum computers can actually solve practical problems is one of the biggest unanswered questions of this growing industry - and one that might be answered by researchers in industrial and medical chemistry in 2026. Calculating the structure, reactivity and other chemical properties of a molecule is an intrinsically quantum problem because it involves its electrons, which are quantum particles. But the more complex a molecule is, the harder these calculations become, in some cases posing a real challenge even for traditional supercomputers. On the other hand, because quantum computers are also intrinsically quantum, they should have an advantage when it comes to tackling these chemical calculations.