Deep Learning
Deep learning meets genome biology
Frey is a co-founder of Deep Genomics, a professor at the University of Toronto and a co-founder of its Machine Learning Group, a senior fellow of the Neural Computation program at the Canadian Institute for Advanced Research, and a fellow of the Royal Society of Canada. My team studied learning and inference in deep architectures, using algorithms based on variational methods, message passing, and Markov chain Monte Carlo (MCMC) simulation. My group's approach was inspired by Beer and Tavazoie's work, but differed in three ways: we examined mammalian cells, we used more advanced machine learning techniques, and we focused on splicing instead of transcription. We built a framework for extracting biological features from genomic sequences, pre-processing the noisy experimental data, and training machine learning techniques to predict splicing patterns from DNA.
AI is the desire to replicate intelligence in machines: Shivaram Kalyanakrishnan
Shivaram Kalyanakrishnan is an assistant professor in the department of computer science and engineering at the Indian Institute of Technology-Bombay. He specialises in artificial intelligence (AI) and is the only author from India who is part of an 18-member study panel of the Stanford University-hosted report titled Artificial Intelligence and Life. Kalyanakrishnan's expertise broadly fits in the area of machine learning. Called reinforcement learning, it defines what actions software agents should take to maximize a certain type of reward after learning from reward and punishment. In an interview, he urges people to be more optimistic about the things AI can do rather than be obsessed with the fear around AI machines.
How Deep Learning Increases Video Viewability
Video viewability is a top priority for video publishers who are under pressure to verify that their audience is actually watching advertisers' content. In a previous post How Deep Learning Video Sequence Drives Profits, we demonstrated why image sequences draw consumer attention. Advanced technologies such as Deep Learning are increasing video Viewability through identifying and learning which images make people stick to content. This content intelligence is the foundation for advancing video machine learning and improving overall video performance. In this post, we will explore some challenges in viewability and how deep learning is boosting video watch rates.
Why India needs an AI policy
With China making rapid progress in artificial intelligence (AI)-based research, it is imperative that India view AI as a critical element of its national security strategy, recommends an August 2016 report titled India and the Artificial Intelligence Revolution. Thanks to the increasingly digital economy, fuelled by improving education and globalization, the Indian consumer is unknowingly the country's biggest beneficiary of recent advances in AI, notes the report. From utilizing various applications powered by AI to using a range of online services such as Amazon Marketplace and Netflix that learn from consumers' online behaviour to make intelligent product and service recommendations, consumers are readily engaged with the proliferation of AI in India, whether they appreciate it or not. Indian academics, public researchers, labs, and entrepreneurs face a different challenge than the corporations that dominate the space--the infrastructure necessary for an AI revolution in India has been neglected by policymakers. While lack of physical infrastructure is certainly a major impediment, India's AI development also suffers from the paucity of the necessary cultural infrastructure, which is key for recent advances from lab to marketplace in AI.
Dynamic Mortality Risk Predictions in Pediatric Critical Care Using Recurrent Neural Networks
Aczon, M, Ledbetter, D, Ho, L, Gunny, A, Flynn, A, Williams, J, Wetzel, R
Viewing the trajectory of a patient as a dynamical system, a recurrent neural network was developed to learn the course of patient encounters in the Pediatric Intensive Care Unit (PICU) of a major tertiary care center. Data extracted from Electronic Medical Records (EMR) of about 12000 patients who were admitted to the PICU over a period of more than 10 years were leveraged. The RNN model ingests a sequence of measurements which include physiologic observations, laboratory results, administered drugs and interventions, and generates temporally dynamic predictions for in-ICU mortality at user-specified times. The RNN's ICU mortality predictions offer significant improvements over those from two clinically-used scores and static machine learning algorithms.
dna2vec: Consistent vector representations of variable-length k-mers
One of the ubiquitous representation of long DNA sequence is dividing it into shorter k-mer components. Unfortunately, the straightforward vector encoding of k-mer as a one-hot vector is vulnerable to the curse of dimensionality. Worse yet, the distance between any pair of one-hot vectors is equidistant. This is particularly problematic when applying the latest machine learning algorithms to solve problems in biological sequence analysis. In this paper, we propose a novel method to train distributed representations of variable-length k-mers. Our method is based on the popular word embedding model word2vec, which is trained on a shallow two-layer neural network. Our experiments provide evidence that the summing of dna2vec vectors is akin to nucleotides concatenation. We also demonstrate that there is correlation between Needleman-Wunsch similarity score and cosine similarity of dna2vec vectors.
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Shazeer, Noam, Mirhoseini, Azalia, Maziarz, Krzysztof, Davis, Andy, Le, Quoc, Hinton, Geoffrey, Dean, Jeff
The capacity of a neural network to absorb information is limited by its number of parameters. Conditional computation, where parts of the network are active on a per-example basis, has been proposed in theory as a way of dramatically increasing model capacity without a proportional increase in computation. In practice, however, there are significant algorithmic and performance challenges. In this work, we address these challenges and finally realize the promise of conditional computation, achieving greater than 1000x improvements in model capacity with only minor losses in computational efficiency on modern GPU clusters. We introduce a Sparsely-Gated Mixture-of-Experts layer (MoE), consisting of up to thousands of feed-forward sub-networks. A trainable gating network determines a sparse combination of these experts to use for each example. We apply the MoE to the tasks of language modeling and machine translation, where model capacity is critical for absorbing the vast quantities of knowledge available in the training corpora. We present model architectures in which a MoE with up to 137 billion parameters is applied convolutionally between stacked LSTM layers. On large language modeling and machine translation benchmarks, these models achieve significantly better results than state-of-the-art at lower computational cost.
Learning what to look in chest X-rays with a recurrent visual attention model
Ypsilantis, Petros-Pavlos, Montana, Giovanni
X-rays are commonly performed imaging tests that use small amounts of radiation to produce pictures of the organs, tissues, and bones of the body. X-rays of the chest are used to detect abnormalities or diseases of the airways, blood vessels, bones, heart, and lungs. In this work we present a stochastic attention-based model that is capable of learning what regions within a chest X-ray scan should be visually explored in order to conclude that the scan contains a specific radiological abnormality. The proposed model is a recurrent neural network (RNN) that learns to sequentially sample the entire X-ray and focus only on informative areas that are likely to contain the relevant information. We report on experiments carried out with more than $100,000$ X-rays containing enlarged hearts or medical devices. The model has been trained using reinforcement learning methods to learn task-specific policies.
Learning to reinforcement learn
Wang, Jane X, Kurth-Nelson, Zeb, Tirumala, Dhruva, Soyer, Hubert, Leibo, Joel Z, Munos, Remi, Blundell, Charles, Kumaran, Dharshan, Botvinick, Matt
In recent years deep reinforcement learning (RL) systems have attained superhuman performance in a number of challenging task domains. However, a major limitation of such applications is their demand for massive amounts of training data. A critical present objective is thus to develop deep RL methods that can adapt rapidly to new tasks. In the present work we introduce a novel approach to this challenge, which we refer to as deep meta-reinforcement learning. Previous work has shown that recurrent networks can support meta-learning in a fully supervised context. We extend this approach to the RL setting. What emerges is a system that is trained using one RL algorithm, but whose recurrent dynamics implement a second, quite separate RL procedure. This second, learned RL algorithm can differ from the original one in arbitrary ways. Importantly, because it is learned, it is configured to exploit structure in the training domain. We unpack these points in a series of seven proof-of-concept experiments, each of which examines a key aspect of deep meta-RL. We consider prospects for extending and scaling up the approach, and also point out some potentially important implications for neuroscience.
Cracking Open the Black Box of Neural Networks
There is a certain allure to the deep learning space in that the very inspiration is based on biomimicry. Deep learning is a subset of artificial intelligence (AI) with an architecture that roughly mirrors the human brain: information is processed through multiple layers to compute an outcome. Unlike other machine learning algorithms, which only have one or two layers, deep learning is "deep" because it has multiple layers – typically between 10 and 100 layers. Computations at each level build upon previous levels, allowing the network to learn more nuanced and abstract characteristics. Each layer is responsible for the detection of one characteristic, basing assumptions on earlier layers.