Goto

Collaborating Authors

 Materials


Machine learning uncovers 'genes of importance' in agriculture

#artificialintelligence

Machine learning can pinpoint "genes of importance" that help crops to grow with less fertilizer, according to a new study published in Nature Communications. It can also predict additional traits in plants and disease outcomes in animals, illustrating its applications beyond agriculture. Using genomic data to predict outcomes in agriculture and medicine is both a promise and challenge for systems biology. Researchers have been working to determine how to best use the vast amount of genomic data available to predict how organisms respond to changes in nutrition, toxins and pathogen exposure--which in turn would inform crop improvement, disease prognosis, epidemiology and public health. However, accurately predicting such complex outcomes in agriculture and medicine from genome-scale information remains a significant challenge.


Predictive Geological Mapping with Convolution Neural Network Using Statistical Data Augmentation on a 3D Model

arXiv.org Artificial Intelligence

Airborne magnetic data are commonly used to produce preliminary geological maps. Machine learning has the potential to partly fulfill this task rapidly and objectively, as geological mapping is comparable to a semantic segmentation problem. Because this method requires a high-quality dataset, we developed a data augmentation workflow that uses a 3D geological and magnetic susceptibility model as input. The workflow uses soft-constrained Multi-Point Statistics, to create many synthetic 3D geological models, and Sequential Gaussian Simulation algorithms, to populate the models with the appropriate magnetic distribution. Then, forward modeling is used to compute the airborne magnetic responses of the synthetic models, which are associated with their counterpart surficial lithologies. A Gated Shape Convolutional Neural Network algorithm was trained on a generated synthetic dataset to perform geological mapping of airborne magnetic data and detect lithological contacts. The algorithm also provides attention maps highlighting the structures at different scales, and clustering was applied to its high-level features to do a semi-supervised segmentation of the area. The validation conducted on a portion of the synthetic dataset and data from adjacent areas shows that the methodology is suitable to segment the surficial geology using airborne magnetic data. Especially, the clustering shows a good segmentation of the magnetic anomalies into a pertinent geological map. Moreover, the first attention map isolates the structures at low scales and shows a pertinent representation of the original data. Thus, our method can be used to produce preliminary geological maps of good quality and new representations of any area where a geological and petrophysical 3D model exists, or in areas sharing the same geological context, using airborne magnetic data only.


Failure-averse Active Learning for Physics-constrained Systems

arXiv.org Machine Learning

Active learning is a subfield of machine learning that is devised for design and modeling of systems with highly expensive sampling costs. Industrial and engineering systems are generally subject to physics constraints that may induce fatal failures when they are violated, while such constraints are frequently underestimated in active learning. In this paper, we develop a novel active learning method that avoids failures considering implicit physics constraints that govern the system. The proposed approach is driven by two tasks: the safe variance reduction explores the safe region to reduce the variance of the target model, and the safe region expansion aims to extend the explorable region exploiting the probabilistic model of constraints. The global acquisition function is devised to judiciously optimize acquisition functions of two tasks, and its theoretical properties are provided. The proposed method is applied to the composite fuselage assembly process with consideration of material failure using the Tsai-wu criterion, and it is able to achieve zero-failure without the knowledge of explicit failure regions.


Partitioned Active Learning for Heterogeneous Systems

arXiv.org Artificial Intelligence

Active learning is a subfield of machine learning that focuses on improving the data collection efficiency of expensive-to-evaluate systems. Especially, active learning integrated surrogate modeling has shown remarkable performance in computationally demanding engineering systems. However, the existence of heterogeneity in underlying systems may adversely affect the performance of active learning. In order to improve the learning efficiency under this regime, we propose the partitioned active learning that seeks the most informative design points for partitioned Gaussian process modeling of heterogeneous systems. The proposed active learning consists of two systematic subsequent steps: the global searching scheme accelerates the exploration of active learning by investigating the most uncertain design space, and the local searching exploits the circumscribed information induced by the local GP. We also propose Cholesky update driven numerical remedies for our active learning to address the computational complexity challenge. The proposed method is applied to numerical simulations and two real-world case studies about (i) the cost-efficient automatic fuselage shape control in aerospace manufacturing; and (ii) the optimal design of tribocorrosion-resistant alloys in materials science. The results show that our approach outperforms benchmark methods with respect to prediction accuracy and computational efficiency.


cgSpan: Pattern Mining in Conceptual Graphs

arXiv.org Artificial Intelligence

Conceptual Graphs (CGs) are a graph-based knowledge representation formalism. In this paper we propose cgSpan a CG frequent pattern mining algorithm. It extends the DMGM-GSM algorithm that takes taxonomy-based labeled graphs as input; it includes three more kinds of knowledge of the CG formalism: (a) the fixed arity of relation nodes, handling graphs of neighborhoods centered on relations rather than graphs of nodes, (b) the signatures, avoiding patterns with concept types more general than the maximal types specified in signatures and (c) the inference rules, applying them during the pattern mining process. The experimental study highlights that cgSpan is a functional CG Frequent Pattern Mining algorithm and that including CGs specificities results in a faster algorithm with more expressive results and less redundancy with vocabulary.


Hit and Lead Discovery with Explorative RL and Fragment-based Molecule Generation

arXiv.org Artificial Intelligence

Recently, utilizing reinforcement learning (RL) to generate molecules with desired properties has been highlighted as a promising strategy for drug design. A molecular docking program - a physical simulation that estimates protein-small molecule binding affinity - can be an ideal reward scoring function for RL, as it is a straightforward proxy of the therapeutic potential. Still, two imminent challenges exist for this task. First, the models often fail to generate chemically realistic and pharmacochemically acceptable molecules. Second, the docking score optimization is a difficult exploration problem that involves many local optima and less smooth surfaces with respect to molecular structure. To tackle these challenges, we propose a novel RL framework that generates pharmacochemically acceptable molecules with large docking scores. Our method - Fragment-based generative RL with Explorative Experience replay for Drug design (FREED) - constrains the generated molecules to a realistic and qualified chemical space and effectively explores the space to find drugs by coupling our fragment-based generation method and a novel error-prioritized experience replay (PER). We also show that our model performs well on both de novo and scaffold-based schemes. Our model produces molecules of higher quality compared to existing methods while achieving state-of-the-art performance on two of three targets in terms of the docking scores of the generated molecules. We further show with ablation studies that our method, predictive error-PER (FREED(PE)), significantly improves the model performance.


Apple selects Chinese giant for critical iPhone role - California News Times

#artificialintelligence

This article is an on-site version of the #techAsia newsletter.sign up here Send newsletter directly to your inbox every Wednesday Hello, Kenji from Tokyo this week is currently undergoing home quarantine for Covid-19. For our big story, there is another scoop about Apple from Nikkei Asia. China's state-owned enterprise has become a supplier of the latest flagship iPhone displays. This shows how advanced China's technology, including artificial intelligence, has advanced, as warned by a former Pentagon chief software officer (Mercedes Top 10). Meanwhile, China is building and diversifying its sources of strategic mineral resources, including lithium, a key component of the world's leading electric vehicle industry (our views, smart data and spotlights).


How to Improve Deep Learning Forecasts for Time Series

#artificialintelligence

Clustering time series data before fitting can improve accuracy by 33% -- src. In 2021, researchers at UCLA developed a method that can improve model fit on many different time series'. By aggregating similarly structured data and fitting a model to each group, our models can specialize. While fairly straightforward to implement, as with any other complex deep learning method, we are often computationally limited by large data sets. However, all of the methods listed have support in both R and python, so development on smaller datasets should be pretty "simple."


How Moveworks' AI platform broke through the multilingual NLP barrier

#artificialintelligence

Chatbots have a checkered past of often not delivering the performance their providers have promised. This is especially true in the IT service management (ITSM) and multilingual NLP spaces, where service desks found support teams deluged with complaints -- yes, about the support chatbots. Just getting English language nuance right and how enterprises communicate often require chatbots to be custom programmed with constraint and logic workflows supported with natural language processing (NLP) and machine learning. If that sounds like a science project, it is, and IT users are the test subjects. Because of their complexity, chatbots were contributing to already overflowing trouble-ticket queues.


Applications and Techniques for Fast Machine Learning in Science

arXiv.org Artificial Intelligence

In this community review report, we discuss applications and techniques for fast machine learning (ML) in science -- the concept of integrating power ML methods into the real-time experimental data processing loop to accelerate scientific discovery. The material for the report builds on two workshops held by the Fast ML for Science community and covers three main areas: applications for fast ML across a number of scientific domains; techniques for training and implementing performant and resource-efficient ML algorithms; and computing architectures, platforms, and technologies for deploying these algorithms. We also present overlapping challenges across the multiple scientific domains where common solutions can be found. This community report is intended to give plenty of examples and inspiration for scientific discovery through integrated and accelerated ML solutions. This is followed by a high-level overview and organization of technical advances, including an abundance of pointers to source material, which can enable these breakthroughs.