Materials
Modeling the transplacental transfer of small molecules using machine learning: a case study on per- and polyfluorinated substances (PFAS) - Journal of Exposure Science & Environmental Epidemiology
Despite their large numbers and widespread use, very little is known about the extent to which per- and polyfluoroalkyl substances (PFAS) can cross the placenta and expose the developing fetus. The aim of our study is to develop a computational approach that can be used to evaluate the of extend to which small molecules, and in particular PFAS, can cross to cross the placenta and partition to cord blood. We collected experimental values of the concentration ratio between cord and maternal blood (RCM) for 260 chemical compounds and calculated their physicochemical descriptors using the cheminformatics package Mordred. Weย used the compiled database to, train and test an artificial neural network (ANN). And then applied the best performing model to predict RCM for a large dataset of PFAS chemicals (nโ=โ7982). We, finally, examined the calculatedย physicochemical descriptors of the chemicals to identify which properties correlated significantly with RCM. We determined that 7855 compounds were within the applicability domain and 127 compounds are outside the applicability domain of our model. Our predictions of RCM for PFAS suggested that 3623 compounds had a log RCMโ>โ0 indicating preferable partitioning to cord blood. Some examples of these compounds were bisphenol AF, 2,2-bis(4-aminophenyl)hexafluoropropane, and nonafluoro-tert-butyl 3-methylbutyrate. These observations have important public health implications as many PFAS have been shown to interfere with fetal development.ย In addition, asย these compounds are highly persistent and many of them can readilyย cross the placenta,ย they are expected to remain in the population for a long time as they are being passed from parent to offspring. Understanding the behavior of chemicals in the human body during pregnancy is critical in preventing harmful exposures during critical periods of development. Many chemicals can cross the placenta and expose the fetus, however, the mechanism by which this transport occurs is not well understood. In our study, we developed a machine learning model that describes the transplacental transfer of chemicals as a function of their physicochemical properties. The model was then used to make predictions for a set of 7982 per-ย and polyfluorinated alkyl substances that are listed on EPAโs CompTox Chemicals Dashboard.ย The model can be applied to make predictions for other chemical categories of interest, such as plasticizers and pesticides. Accurate predictions of RCMย can help scientists and regulators to prioritize chemicals that have the potential to cause harm by exposing the fetus.
SynKB: Semantic Search for Synthetic Procedures
Bai, Fan, Ritter, Alan, Madrid, Peter, Freitag, Dayne, Niekrasz, John
In this paper we present SynKB, an open-source, automatically extracted knowledge base of chemical synthesis protocols. Similar to proprietary chemistry databases such as Reaxsys, SynKB allows chemists to retrieve structured knowledge about synthetic procedures. By taking advantage of recent advances in natural language processing for procedural texts, SynKB supports more flexible queries about reaction conditions, and thus has the potential to help chemists search the literature for conditions used in relevant reactions as they design new synthetic routes. Using customized Transformer models to automatically extract information from 6 million synthesis procedures described in U.S. and EU patents, we show that for many queries, SynKB has higher recall than Reaxsys, while maintaining high precision. We plan to make SynKB available as an open-source tool; in contrast, proprietary chemistry databases require costly subscriptions.
A General Recipe for Likelihood-free Bayesian Optimization
Song, Jiaming, Yu, Lantao, Neiswanger, Willie, Ermon, Stefano
The acquisition function, a critical component in Bayesian optimization (BO), can often be written as the expectation of a utility function under a surrogate model. However, to ensure that acquisition functions are tractable to optimize, restrictions must be placed on the surrogate model and utility function. To extend BO to a broader class of models and utilities, we propose likelihood-free BO (LFBO), an approach based on likelihood-free inference. LFBO directly models the acquisition function without having to separately perform inference with a probabilistic surrogate model. We show that computing the acquisition function in LFBO can be reduced to optimizing a weighted classification problem, where the weights correspond to the utility being chosen. By choosing the utility function for expected improvement (EI), LFBO outperforms various state-of-the-art black-box optimization methods on several real-world optimization problems. LFBO can also effectively leverage composite structures of the objective function, which further improves its regret by several orders of magnitude.
ChemAlgebra: Algebraic Reasoning on Chemical Reactions
Valenti, Andrea, Bacciu, Davide, Vergari, Antonio
While showing impressive performance on various kinds of learning tasks, it is yet unclear whether deep learning models have the ability to robustly tackle reasoning tasks. than by learning the underlying reasoning process that is actually required to solve the tasks. Measuring the robustness of reasoning in machine learning models is challenging as one needs to provide a task that cannot be easily shortcut by exploiting spurious statistical correlations in the data, while operating on complex objects and constraints. reasoning task. To address this issue, we propose ChemAlgebra, a benchmark for measuring the reasoning capabilities of deep learning models through the prediction of stoichiometrically-balanced chemical reactions. ChemAlgebra requires manipulating sets of complex discrete objects -- molecules represented as formulas or graphs -- under algebraic constraints such as the mass preservation principle. We believe that ChemAlgebra can serve as a useful test bed for the next generation of machine reasoning models and as a promoter of their development.
Machine learning in bioprocess development: From promise to practice
Helleckes, Laura Marie, Hemmerich, Johannes, Wiechert, Wolfgang, von Lieres, Eric, Grรผnberger, Alexander
Fostered by novel analytical techniques, digitalization and automation, modern bioprocess development provides high amounts of heterogeneous experimental data, containing valuable process information. In this context, data-driven methods like machine learning (ML) approaches have a high potential to rationally explore large design spaces while exploiting experimental facilities most efficiently. The aim of this review is to demonstrate how ML methods have been applied so far in bioprocess development, especially in strain engineering and selection, bioprocess optimization, scale-up, monitoring and control of bioprocesses. For each topic, we will highlight successful application cases, current challenges and point out domains that can potentially benefit from technology transfer and further progress in the field of ML.
Neural network for determining an asteroid mineral composition from reflectance spectra
Korda, David, Penttilรค, Antti, Klami, Arto, Kohout, Tomรกลก
Chemical and mineral compositions of asteroids reflect the formation and history of our Solar System. This knowledge is also important for planetary defence and in-space resource utilisation. We aim to develop a fast and robust neural-network-based method for deriving the mineral modal and chemical compositions of silicate materials from their visible and near-infrared spectra. The method should be able to process raw spectra without significant pre-processing. We designed a convolutional neural network with two hidden layers for the analysis of the spectra, and trained it using labelled reflectance spectra. For the training, we used a dataset that consisted of reflectance spectra of real silicate samples stored in the RELAB and C-Tape databases, namely olivine, orthopyroxene, clinopyroxene, their mixtures, and olivine-pyroxene-rich meteorites. We used the model on two datasets. First, we evaluated the model reliability on a test dataset where we compared the model classification with known compositional reference values. The individual classification results are mostly within 10 percentage-point intervals around the correct values. Second, we classified the reflectance spectra of S-complex (Q-type and V-type, also including A-type) asteroids with known Bus-DeMeo taxonomy classes. The predicted mineral chemical composition of S-type and Q-type asteroids agree with the chemical composition of ordinary chondrites. The modal abundances of V-type and A-type asteroids show a dominant contribution of orthopyroxene and olivine, respectively. Additionally, our predictions of the mineral modal composition of S-type and Q-type asteroids show an apparent depletion of olivine related to the attenuation of its diagnostic absorptions with space weathering. This trend is consistent with previous results of the slower pyroxene response to space weathering relative to olivine.
Sequential Brick Assembly with Efficient Constraint Satisfaction
Ahn, Seokjun, Kim, Jungtaek, Cho, Minsu, Park, Jaesik
We address the problem of generating a sequence of LEGO brick assembly with high-fidelity structures, satisfying physical constraints between bricks. The assembly problem is challenging since the number of possible structures increases exponentially with the number of available bricks, complicating the physical constraints to satisfy across bricks. To tackle this problem, our method performs a brick structure assessment to predict the next brick position and its confidence by employing a U-shaped sparse 3D convolutional network. The convolution filter efficiently validates physical constraints in a parallelizable and scalable manner, allowing to process of different brick types. To generate a novel structure, we devise a sampling strategy to determine the next brick position by considering attachable positions under physical constraints. Instead of using handcrafted brick assembly datasets, our model is trained with a large number of 3D objects that allow to create a new high-fidelity structure. We demonstrate that our method successfully generates diverse brick structures while handling two different brick types and outperforms existing methods based on Bayesian optimization, graph generative model, and reinforcement learning, all of which are limited to a single brick type.
Process Modeling, Hidden Markov Models, and Non-negative Tensor Factorization with Model Selection
Skau, Erik, Hollis, Andrew, Eidenbenz, Stephan, Rasmussen, Kim, Alexandrov, Boian
Monitoring of industrial processes is a critical capability in industry and in government to ensure reliability of production cycles, quick emergency response, and national security. Process monitoring allows users to gauge the involvement of an organization in an industrial process or predict the degradation or aging of machine parts in processes taking place at a remote location. Similar to many data science applications, we usually only have access to limited raw data, such as satellite imagery, short video clips, some event logs, and signatures captured by a small set of sensors. To combat data scarcity, we leverage the knowledge of subject matter experts (SMEs) who are familiar with the process. Various process mining techniques have been developed for this type of analysis; typically such approaches combine theoretical process models built based on domain expert insights with ad-hoc integration of available pieces of raw data. Here, we introduce a novel mathematically sound method that integrates theoretical process models (as proposed by SMEs) with interrelated minimal Hidden Markov Models (HMM), built via non-negative tensor factorization and discrete model simulations. Our method consolidates: (a) Theoretical process models development, (b) Discrete model simulations (c) HMM, (d) Joint Non-negative Matrix Factorization (NMF) and Non-negative Tensor Factorization (NTF), and (e) Custom model selection. To demonstrate our methodology and its abilities, we apply it on simple synthetic and real world process models.
Boosting Heterogeneous Catalyst Discovery by Structurally Constrained Deep Learning Models
Korovin, Alexey N., Humonen, Innokentiy S., Samtsevich, Artem I., Eremin, Roman A., Vasilyev, Artem I., Lazarev, Vladimir D., Budennyy, Semen A.
The discovery of new catalysts is one of the significant topics of computational chemistry as it has the potential to accelerate the adoption of renewable energy sources. Recently developed deep learning approaches such as graph neural networks (GNNs) open new opportunity to significantly extend scope for modelling novel high-performance catalysts. Nevertheless, the graph representation of particular crystal structure is not a straightforward task due to the ambiguous connectivity schemes and numerous embeddings of nodes and edges. Here we present embedding improvement for GNN that has been modified by Voronoi tesselation and is able to predict the energy of catalytic systems within Open Catalyst Project dataset. Enrichment of the graph was calculated via Voronoi tessellation and the corresponding contact solid angles and types (direct or indirect) were considered as features of edges and Voronoi volumes were used as node characteristics. The auxiliary approach was enriching node representation by intrinsic atomic properties (electronegativity, period and group position). Proposed modifications allowed us to improve the mean absolute error of the original model and the final error equals to 651 meV per atom on the Open Catalyst Project dataset and 6 meV per atom on the intermetallics dataset. Also, by consideration of additional dataset, we show that a sensible choice of data can decrease the error to values above physically-based 20 meV per atom threshold.
The Download: text-to-video AI, and China's big methanol bet
What's happened: Meta has unveiled an AI system that generates short videos based on text prompts. Make-A-Video lets you type in a string of words, like "A dog wearing a superhero outfit with a red cape flying through the sky," and then generates a five-second clip that, while pretty accurate, has the aesthetics of a trippy old home video. How it works: Meta combined data from three open-source image and video data sets to train its model. Standard text-image data sets of labeled still images helped the AI learn what objects are called and what they look like. And a database of videos helped it learn how those objects are supposed to move in the world. Why it matters: Although the effect is rather crude, the system offers an early glimpse of what's coming next for generative artificial intelligence, and it is the next obvious step from the text-to-image AI systems that have caused huge excitement this year.