A comparison of machine learning algorithms for chemical toxicity classification using a simulated multi-scale data model


A daunting challenge faced by environmental regulators in the U.S. and other countries is the requirement that they evaluate the potential toxicity of a large number of unique chemicals that are currently in common use (in the range of 10,000–30,000) but for which little toxicology information is available. The time and cost required for traditional toxicity testing approaches, coupled with the desire to reduce animal use is driving the search for new toxicity prediction methods [1–3]. Several efforts are starting to address this information gap by using relatively inexpensive, high throughput screening approaches in order to link chemical and biological space [1, 4–21]. The U.S. EPA is carrying out one such large screening and prioritization experiment, called ToxCast, whose goal is to develop predictive signatures or classifiers that can accurately predict whether a given chemical will or will not cause particular toxicities [4]. This program is investigating a variety of chemically-induced toxicity endpoints including developmental and reproductive toxicity, neurotoxicity and cancer. The initial training set being used comes from a collection of 300 pesticide active ingredients for which complete rodent toxicology profiles have been compiled. This set of chemicals will be tested in several hundred in vitro assays.

How Machine Learning Helps Identify Toxicity In Potential Drugs


The team believe that being able to determine the atomic structure of protein molecules will play a huge role in understanding how they work, and how they may respond to drug therapies. The drugs typically work by binding to a protein molecule, and then changing its shape and thus altering how it works.