12 Supervised Learning Modern Statistics for Modern Biology


In a supervised learning setting, we have a yardstick or plumbline to judge how well we are doing: the response itself. A frequent question in biological and biomedical applications is whether a property of interest (say, disease type, cell type, the prognosis of a patient) can be "predicted", given one or more other properties, called the predictors. Often we are motivated by a situation in which the property to be predicted is unknown (it lies in the future, or is hard to measure), while the predictors are known. The crucial point is that we learn the prediction rule from a set of training data in which the property of interest is also known. Once we have the rule, we can either apply it to new data, and make actual predictions of unknown outcomes; or we can dissect the rule with the aim of better understanding the underlying biology. Compared to unsupervised learning and what we have seen in Chapters 5, 7 and 9, where we do not know what we are looking for or how to decide whether our result is "right", we are on much more solid ground with supervised learning: the objective is clearly stated, and there are straightforward criteria to measure how well we are doing. The central issues in supervised learning151151 Sometimes the term statistical learning is used, more or less exchangeably. Or did our rule indeed pick up some of the pertinent patterns in the system being studied, which will also apply to yet unseen new data? An example for overfitting: two regression lines are fit to data in the \((x, y)\)-plane (black points). We can think of such a line as a rule that predicts the \(y\)-value, given an \(x\)-value. Both lines are smooth, but the fits differ in what is called their bandwidth, which intuitively can be interpreted their stiffness. The blue line seems overly keen to follow minor wiggles in the data, while the orange line captures the general trend but is less detailed. The effective number of parameters needed to describe the blue line is much higher than for the orange line. Also, if we were to obtain additional data, it is likely that the blue line would do a worse job than the orange line in modeling the new data. We'll formalize these concepts –training error and test set error– later in this chapter. Although exemplified here with line fitting, the concept applies more generally to prediction models. See exemplary applications that motivate the use of supervised learning methods.

Hunting for New Drugs with AI


THERE ARE MANY REASONS that promising drugs wash out during pharmaceutical development, and one of them is cytochrome P450. A set of enzymes mostly produced in the liver, CYP450, as it is commonly called, is involved in breaking down chemicals and preventing them from building up to dangerous levels in the bloodstream. Many experimental drugs, it turns out, inhibit the production of CYP450--a vexing side effect that can render such a drug toxic in humans. Drug companies have long relied on conventional tools to try to predict whether a drug candidate will inhibit CYP450 in patients, such as by conducting chemical analyses in test tubes, looking at CYP450 interactions with better-understood drugs that have chemical similarities, and running tests on mice. But their predictions are wrong about a third of the time.

Pioneer in oncologic imaging, Beets-Tan, addresses audience at ECCO summit on implications of AI in radiology


The European CanCer Organisation (ECCO) is a not-for-profit federation of member organisations working in the area of cancer. It convenes oncology professionals and patients to be the united voice of Europe's cancer community. ECCO's annual European Cancer Summit brings worldwide healthcare leaders together in a unique multidisciplinary forum. In September 2019, ESR 2nd Vice-President, Professor Beets-Tan, took the floor at the European CanCer Organisation's renowned annual summit to discuss the positive impact of artificial intelligence (AI) on the quality of care and its place in radiology. In a session titled "Artificial Intelligence: Breaking down borders in cancer care in ways not yet known?", and before an audience of global leaders representing cancer care, research, patient advocacy and public-private sectors, she delivered a talk on the way AI will transform healthcare roles and skills.

Artificial intelligence program aims to help doctors more accurately diagnose breast cancer


A team at Google has developed an artificial intelligence program aimed at helping doctors accurately detect cancer in mammograms. Thousands of women receive a false negative on their breast cancer tests each year, while one in 10 receive a false positive. Shravya Shetty, who heads the Google team developing the system, told CBS News' Jamie Yuccas that their AI model reduced false positives by almost 6% and false negatives by about 9%. Shetty also claimed that it caught suspicious tissues on mammograms missed by the human eye. Interventional radiologist Dr. Susan Drossman predicted that the AI program would be integrated into her and other doctors' work stations "probably within the next year."



With the help of one example, we show how a dramatic reduction in RNA sequencing depth has little to no impact on the performance of machine learning-based linear Cox models that predict disease outcome based on tumor gene expression. Since this analysis is peformed in R, if you have not installed it yet, you can follow the intructions in https://cran.r-project.org/. In case R is installed, it needs to be version 3.6.1 or higher for this example to work. The following code can help determine if R needs to be updated. In this example, we will use adrenocortical carcinoma (ACC) to demonstrate how a drastic reduction in RNA-seq depth still gives enough information to predict the relative risk of adverse outcome of disease.

Project in Python - Breast Cancer Classification with Deep Learning - DataFlair


If you want to master Python programming language then you can't skip projects in Python. After publishing 4 advanced python projects, DataFlair today came with another one that is the Breast Cancer Classification project in Python. To crack your next Python Interview, practice these projects thoroughly and if you face any confusion, do comment, DataFlair is always ready to help you. An intensive approach to Machine Learning, Deep Learning is inspired by the workings of the human brain and its biological neural networks. Architectures as deep neural networks, recurrent neural networks, convolutional neural networks, and deep belief networks are made of multiple layers for the data to pass through before finally producing the output.

Pathologist Versus Artificial Pathologist: What Do We Really Want (Need) From Machine Learning


One often reads that the complexities of anatomical pathology are now, or are soon to be unraveled by the latest machine learning technologies. Such incredible claims are bolstered by the experience of seeing a system classify histology images (or better, training one's own). It truly is remarkable that this is even possible. Yet, as this becomes a more common experience for the pathology community, it is likely that our current expectations and ambitions will be tempered by the constraints of reality. I remember being awe-struck at how realistic computer graphics were in the late 80's and early 90's.

The VA Has Embraced Artificial Intelligence To Improve Veterans' Health Care


Researchers at the Tampa veterans' hospital are training computers to diagnose cancer. It's one example of how the Department of Veterans Affairs is expanding artificial intelligence development. Inside a laboratory at the James A. Haley Veterans' Hospital in Tampa, Fla., machines are rapidly processing tubes of patients' body fluids and tissue samples. Pathologists examine those samples under microscopes to spot signs of cancer and other diseases. But distinguishing certain features about a cancer cell can be difficult, so Drs.

Learning Patient-Specific Cancer Survival Distributions as a Sequence of Dependent Regressors

Neural Information Processing Systems

An accurate model of patient survival time can help in the treatment and care of cancer patients. The common practice of providing survival time estimates based only on population averages for the site and stage of cancer ignores many important individual differences among patients. In this paper, we propose a local regression method for learning patient-specific survival time distribution based on patient attributes such as blood tests and clinical assessments. When tested on a cohort of more than 2000 cancer patients, our method gives survival time predictions that are much more accurate than popular survival analysis models such as the Cox and Aalen regression models. Our results also show that using patient-specific attributes can reduce the prediction error on survival time by as much as 20% when compared to using cancer site and stage only.

Diagnosis of thyroid cancer using deep convolutional neural network models applied to sonographic images: a retrospective, multicohort, diagnostic study


The incidence of thyroid cancer is rising steadily because of overdiagnosis and overtreatment conferred by widespread use of sensitive imaging techniques for screening. This overall incidence growth is especially driven by increased diagnosis of indolent and well-differentiated papillary subtype and early-stage thyroid cancer, whereas the incidence of advanced-stage thyroid cancer has increased marginally. Thyroid ultrasound is frequently used to diagnose thyroid cancer. The aim of this study was to use deep convolutional neural network (DCNN) models to improve the diagnostic accuracy of thyroid cancer by analysing sonographic imaging data from clinical ultrasounds.