Deep learning for computational biology

#artificialintelligence 

A supervised machine learning model aims to learn a function f(x) y from a list of training pairs (x1,y1), (x2,y2), … for which data are recorded (Fig 1B). One typical application in biology is to predict the viability of a cancer cell line when exposed to a chosen drug (Menden et al, 2013; Eduati et al, 2015). The input features (x) would capture somatic sequence variants of the cell line, chemical make?up of the drug and its concentration, which together with the measured viability (output label y) can be used to train a support vector machine, a random forest classifier or a related method (functional relationship f). Given a new cell line (unlabelled data sample x*) in the future, the learnt function predicts its survival (output label y*) by calculating f(x*), even if f resembles more of a black box, and its inner workings of why particular mutation combinations influence cell growth are not easily interpreted. Both regression (where y is a real number) and classification (where y is a categorical class label) can be viewed in this way. As a counterpart, unsupervised machine learning approaches aim to discover patterns from the data samples x themselves, without the need for output labels y. Methods such as clustering, principal component analysis and outlier detection are typical examples of unsupervised models applied to biological data. The inputs x, calculated from the raw data, represent what the model "sees about the world", and their choice is highly problem?specific (Fig 1C). Deriving most informative features is essential for performance, but the process can be labour?intensive