A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model


A daunting challenge faced by environmental regulators in the U.S. and other countries is the requirement that they evaluate the potential toxicity of a large number of unique chemicals that are currently in common use (in the range of 10,000–30,000) but for which little toxicology information is available. The time and cost required for traditional toxicity testing approaches, coupled with the desire to reduce animal use is driving the search for new toxicity prediction methods [1–3]. Several efforts are starting to address this information gap by using relatively inexpensive, high throughput screening approaches in order to link chemical and biological space [1, 4–21]. The U.S. EPA is carrying out one such large screening and prioritization experiment, called ToxCast, whose goal is to develop predictive signatures or classifiers that can accurately predict whether a given chemical will or will not cause particular toxicities [4]. This program is investigating a variety of chemically-induced toxicity endpoints including developmental and reproductive toxicity, neurotoxicity and cancer. The initial training set being used comes from a collection of 300 pesticide active ingredients for which complete rodent toxicology profiles have been compiled. This set of chemicals will be tested in several hundred in vitro assays.

How Machine Learning Helps Identify Toxicity In Potential Drugs


The team believe that being able to determine the atomic structure of protein molecules will play a huge role in understanding how they work, and how they may respond to drug therapies. The drugs typically work by binding to a protein molecule, and then changing its shape and thus altering how it works.

Toxicity Prediction using Deep Learning Machine Learning

Everyday we are exposed to various chemicals via food additives, cleaning and cosmetic products and medicines -- and some of them might be toxic. However testing the toxicity of all existing compounds by biological experiments is neither financially nor logistically feasible. Therefore the government agencies NIH, EPA and FDA launched the Tox21 Data Challenge within the "Toxicology in the 21st Century" (Tox21) initiative. The goal of this challenge was to assess the performance of computational methods in predicting the toxicity of chemical compounds. State of the art toxicity prediction methods build upon specifically-designed chemical descriptors developed over decades. Though Deep Learning is new to the field and was never applied to toxicity prediction before, it clearly outperformed all other participating methods. In this application paper we show that deep nets automatically learn features resembling well-established toxicophores. In total, our Deep Learning approach won both of the panel-challenges (nuclear receptors and stress response) as well as the overall Grand Challenge, and thereby sets a new standard in tox prediction.

Applying the Subdue Substructure Discovery System to the Chemical Toxicity Domain

AAAI Conferences

The ever-increasing number of chemical compounds added every year has not been accompanied by a similar growth in our ability to analyze and classify these compounds. The problem of prevention of cancer caused by many of these chemicals has been of great scientific and humanitarian value. The use of AI discovery tools for predicting chemical toxicity is being investigated. The basic idea behind the work is to obtain structure-activity representation (SARs)[Srinivasan et al.], which relates molecular structures to cancerous activity. The data is obtained from the U.S National Toxicology Program conducted by the National Institute of Environmental Health Sciences (NIEHS). A general approach to automatically discover repetitive substructures from the datasets is outlined by this research. Relevant SARs are identified using the Subdue substructure discovery system that discovers commonly occurring substructures in a given set of compounds. The best substructure given by Subdue is used as a pattern indicative of cancerous activity.

Use of Statistical and Neural Net Methods in Predicting Toxicity of Chemicals: A Hierarchical QSAR Approach

AAAI Conferences

In 1998 the number of chemicals registered with the Chemical Abstract Service (CAS) rose to over 19 million (CAS 1999). This is an increase of over 3 million chemicals between 1996 and 1998. It would certainly be desirable to be able to test each of these chemicals for their effects on the enviromnent and hmnan health (which we refer to as hazard assessment); however, completing the battery of tests necessary for the proper hazard assessment of even a single compound is a costly and time-consuming process. Therefore, there is simply not enough time or money to complete these test batteries for even a tiny portion of the compounds which are registered today (Menzel 1995). An alternative to these traditional test batteries is to develop computational models for hazard assessment.

Overview of Different Artificial Intelligence Approaches Combined with a Deductive Logic-based Expert System for Predicting Chemical Toxicity

AAAI Conferences

The paper focuses on the different artificial intelligent approaches which had been applied by the system during its 12 years experience, notably: a.) the deductive logic of HazardExpert for predicting toxicity b.) reasoning by analogy for improving the contextdependency of the metabolism engine of HazardExpert c.) using neural network in combination of HazardExpert The presentation compares the performance of the different released versions used at approximately 100 industrial, academic and governmental institutions in 15 countries. HazardExpert m Overview Using the knowledge base collected by the US Environmental Protection Agency, an expert system family (HazardExpert) has been developed in 1987. HazardExpert predicts the toxicity of a compound in seven toxicity classes, such as oncogenicity, mutagenicity, teratogenicity, irritation, sensitivity, immunotoxicity and neurotoxieity by identifying toxic fragments in the molecule and assigning expected toxicity based on the detected fragments. For predicting the toxic effect of the metabolites, the software generates their structures, then searches for the toxic fragments, and summarizes the results. For the prediction, the MetabolExpert engine is used.