Materials
SAFE: Scalable Automatic Feature Engineering Framework for Industrial Tasks
Shi, Qitao, Zhang, Ya-Lin, Li, Longfei, Yang, Xinxing, Li, Meng, Zhou, Jun
Machine learning techniques have been widely applied in Internet companies for various tasks, acting as an essential driving force, and feature engineering has been generally recognized as a crucial tache when constructing machine learning systems. Recently, a growing effort has been made to the development of automatic feature engineering methods, so that the substantial and tedious manual effort can be liberated. However, for industrial tasks, the efficiency and scalability of these methods are still far from satisfactory. In this paper, we proposed a staged method named SAFE (Scalable Automatic Feature Engineering), which can provide excellent efficiency and scalability, along with requisite interpretability and promising performance. Extensive experiments are conducted and the results show that the proposed method can provide prominent efficiency and competitive effectiveness when comparing with other methods. What's more, the adequate scalability of the proposed method ensures it to be deployed in large scale industrial tasks.
A physics-informed feature engineering approach to use machine learning with limited amounts of data for alloy design: shape memory alloy demonstration
Liu, Sen, Kappes, Branden B., Amin-ahmadi, Behnam, Benafan, Othmane, Stebner, Aaron P., Zhang, Xiaoli
Decades of global research and development initiatives such as Integrated Computational Materials Engineering (ICME) [2][3] and the Materials Genome Initiative (MGI) [4] have demonstrated the ability for both physics-based and data-driven computations to accelerate the discovery and deployment of new alloys. It is established that machine learning (ML) can model process-structure-property relationships of alloys [5][6]. Of equal or greater impact, ML can greatly reduce the number of physics-based experiments and calculations needed to discover and design new materials with optimal properties [7][8][9]. However, the robust prediction of a new alloy and its processing designed to meet a desired, yet not previously achieved performance remains an open challenge; one that is met in this work. In other sects of materials science and engineering where new materials have been successfully predicted, the formulation of effective data descriptors, or "feature engineering," has emerged as a critical data pre-processing step to enable better performances from ML. Most such studies have focused on using high-throughput physics-based calculations together with chemical element descriptors to assist ML prediction [7][9].
Quantum Computing Assisted Deep Learning for Fault Detection and Diagnosis in Industrial Process Systems
Quantum computing (QC) and deep learning techniques have attracted widespread attention in the recent years. This paper proposes QC-based deep learning methods for fault diagnosis that exploit their unique capabilities to overcome the computational challenges faced by conventional data-driven approaches performed on classical computers. Deep belief networks are integrated into the proposed fault diagnosis model and are used to extract features at different levels for normal and faulty process operations. The QC-based fault diagnosis model uses a quantum computing assisted generative training process followed by discriminative training to address the shortcomings of classical algorithms. To demonstrate its applicability and efficiency, the proposed fault diagnosis method is applied to process monitoring of continuous stirred tank reactor (CSTR) and Tennessee Eastman (TE) process. The proposed QC-based deep learning approach enjoys superior fault detection and diagnosis performance with obtained average fault detection rates of 79.2% and 99.39% for CSTR and TE process, respectively.
CIFAR AI Catalyst Grants
One-day research workshops on the application of AI approaches to a dedicated area of research (e.g. Workshops may be held in any Canadian city, but must include participants from multiple research institutions (universities, research institutes, research hospitals). The goal of the workshop should be to identify opportunities for the application of AI to the specific domain of interest, identify emerging research opportunities and foster the development of new collaborations. Up to $20,000 of funding is available and applicants will be asked to provide a complete budget. CIFAR will provide some logistical support to workshop organizers (e.g.
The premier gathering for the Steel industry...… Future Steel Forum
Now approaching its fourth successful year, the next edition of Future Steel Forum will take place in Prague, the capital of the Czech Republic on 2 – 3 June 2020. Future Steel Forum is all about the application of industry 4.0 to the steelmaking process.. Delegates can expect to hear from the world's leading experts on high-tech steelmaking n a whole range of topics including: For 2020, we have more steelmakers involved than ever before including: ArcelorMittal, Tata Steel India, Emirates Steel, POSCO, Mtinvest Digital, Big River Steel, Liberty Steel Group, Kobe Steel, Buderus Edelstahl, Badische Stahl Engineering and TMK. Delegates can expect lively conversation, animated discussion panels and plenty of networking opportunities. Take advantage of the early-bird rate and register for your pass today. "It's an exceptional opportunity to assess the evolution of the steel industry and to meet relevant people."
Analytical Equations based Prediction Approach for PM2.5 using Artificial Neural Network
Particulate matter pollution is one of the deadliest types of air pollution worldwide due to its significant impacts on the global environment and human health. Particulate Matter (PM2.5) is one of the important particulate pollutants to measure the Air Quality Index (AQI). The conventional instruments used by the air quality monitoring stations to monitor PM2.5 are costly, bulkier, time-consuming, and power-hungry. Furthermore, due to limited data availability and non-scalability, these stations cannot provide high spatial and temporal resolution in real-time. To overcome the disadvantages of existing methodology this article presents analytical equations based prediction approach for PM2.5 using an Artificial Neural Network (ANN). Since the derived analytical equations for the prediction can be computed using a Wireless Sensor Node (WSN) or low-cost processing tool, it demonstrates the usefulness of the proposed approach. Moreover, the study related to correlation among the PM2.5 and other pollutants is performed to select the appropriate predictors. The large authenticate data set of Central Pollution Control Board (CPCB) online station, India is used for the proposed approach. The RMSE and coefficient of determination (R2) obtained for the proposed prediction approach using eight predictors are 1.7973 ug/m3 and 0.9986 respectively. While the proposed approach results show RMSE of 7.5372 ug/m3 and R2 of 0.9708 using three predictors. Therefore, the results demonstrate that the proposed approach is one of the promising approaches for monitoring PM2.5 without power-hungry gas sensors and bulkier analyzers.
FedCoin: A Peer-to-Peer Payment System for Federated Learning
Liu, Yuan, Sun, Shuai, Ai, Zhengpeng, Zhang, Shuangfeng, Liu, Zelei, Yu, Han
Federated learning (FL) is an emerging collaborative machine learning method to train models on distributed datasets with privacy concerns. To properly incentivize data owners to contribute their efforts, Shapley Value (SV) is often adopted to fairly assess their contribution. However, the calculation of SV is time-consuming and computationally costly. In this paper, we propose FedCoin, a blockchain-based peer-to-peer payment system for FL to enable a feasible SV based profit distribution. In FedCoin, blockchain consensus entities calculate SVs and a new block is created based on the proof of Shapley (PoSap) protocol. It is in contrast to the popular BitCoin network where consensus entities "mine" new blocks by solving meaningless puzzles. Based on the computed SVs, a scheme for dividing the incentive payoffs among FL clients with nonrepudiation and tamper-resistance properties is proposed. Experimental results based on real-world data show that FedCoin can promote high-quality data from FL clients through accurately computing SVs with an upper bound on the computational resources required for reaching consensus. It opens opportunities for non-data owners to play a role in FL.
ChemGrapher: Optical Graph Recognition of Chemical Compounds by Deep Learning
Oldenhof, Martijn, Arany, Adam, Moreau, Yves, Simm, Jaak
In drug discovery, knowledge of the graph structure of chemical compounds is essential. Many thousands of scientific articles in chemistry and pharmaceutical sciences have investigated chemical compounds, but in cases the details of the structure of these chemical compounds is published only as an images. A tool to analyze these images automatically and convert them into a chemical graph structure would be useful for many applications, such drug discovery. A few such tools are available and they are mostly derived from optical character recognition. However, our evaluation of the performance of those tools reveals that they make often mistakes in detecting the correct bond multiplicity and stereochemical information. In addition, errors sometimes even lead to missing atoms in the resulting graph. In our work, we address these issues by developing a compound recognition method based on machine learning. More specifically, we develop a deep neural network model for optical compound recognition. The deep learning solution presented here consists of a segmentation model, followed by three classification models that predict atom locations, bonds and charges. Furthermore, this model not only predicts the graph structure of the molecule but also produces all information necessary to relate each component of the resulting graph to the source image. This solution is scalable and could rapidly process thousands of images. Finally, we compare empirically the proposed method to a well-established tool and observe significant error reductions.
Autonomous Discovery of Unknown Reaction Pathways from Data by Chemical Reaction Neural Network
The inference of chemical reaction networks is an important task in understanding the chemical processes in life sciences and environment. Yet, only a few reaction systems are well-understood due to a large number of important reaction pathways involved but still unknown. Revealing unknown reaction pathways is an important task for scientific discovery that takes decades and requires lots of expert knowledge. This work presents a neural network approach for discovering unknown reaction pathways from concentration time series data. The neural network denoted as Chemical Reaction Neural Network (CRNN), is designed to be equivalent to chemical reaction networks by following the fundamental physics laws of the Law of Mass Action and Arrhenius Law. The CRNN is physically interpretable, and its weights correspond to the reaction pathways and rate constants of the chemical reaction network. Then, inferencing the reaction pathways and the rate constants are accomplished by training the equivalent CRNN via stochastic gradient descent. The approach precludes the need for expert knowledge in proposing candidate reactions, such that the inference is autonomous and applicable to new systems for which there is no existing empirical knowledge to propose reaction pathways. The physical interpretability also makes the CRNN not only capable of fitting the data for a given system but also developing knowledge of unknown pathways that could be generalized to similar chemical systems. Finally, the approach is applied to several chemical systems in chemical engineering and biochemistry to demonstrate its robustness and generality.