Deep Learning
Keras Cheat Sheet: Neural Networks in Python
Keras is an easy-to-use and powerful library for Theano and TensorFlow that provides a high-level neural networks API to develop and evaluate deep learning models. We recently launched one of the first online interactive deep learning course using Keras 2.0, called "Deep Learning in Python. Now, DataCamp has created a Keras cheat sheet for those who have already taken the course and that still want a handy one-page reference or for those who need an extra push to get started. In short, you'll see that this cheat sheet not only presents you with the six steps that you can go through to make neural networks in Python with the Keras library. In no time, this Keras cheat sheet will make you familiar with how you can load datasets from the library itself, preprocess the data, build up a model architecture, and compile, train, and evaluate it.
Artificial Intelligence is Completely Transforming Modern Healthcare
As medical imaging technology continues to take advantage of every new deep learning breakthrough, the challenge is that the computing technology on which it relies must evolve just as quickly. A company called Nvidia is leading that charge under the guidance of Kimberley Powell, who is confident that Nvidia's processors are not only meeting the deep learning standards of medical imagining, but also pushing the industry forward as a whole. Nvidia's hardware has established its silent but prominent role in deep learning's marriage with medicine. Powell believes projects like their specialized computers, such as the DGX-1 a powerful deep-learning product, will become increasingly more common in hospitals and medical research centers. Strong computing power, like what the DGX-1 can provide, stands to increase the reliability of the diagnostic process; something that, in turn, would significantly boost the standard of care in developing countries.
9 Computational Drug Discovery Startups Using AI - Nanalyze
Recently we talked before how big data is the new frontier with just .05% of all data available today having been analyzed. This means that all kinds of gold prospectors are lining up with their freshly crafted artificial intelligence (AI) algorithms looking to extract all the value they can from this wild west of data before someone else does. Perhaps nowhere is there more excitement at the moment than the applications to be had in the healthcare industry. Here's a look at just some of the startups that are applying artificial intelligence and big data to healthcare (courtesy of the bright minds over at CB Insights): The application that we've circled above is "drug discovery" using AI or what's also known as "computational drug discovery". The reason that this is now a thing is not just because of all the big data that's available now, but also because of how cheap cloud computing has become, not to mention the emergence of deep learning algorithms.
How to Talk to Your Data Scientist
Machine learning is poised to help marketers garner phenomenal new insights and results, and to change many processes and jobs along the way. We discussed this potential in "Machine Learning is About to Turn the Marketing World Upside Down." Machine learning can't provide better results alone, of course. Marketers need to collaborate with data scientists to identify important questions to explore, accelerate tests, improve the accuracy of answers, and make better decisions. And to effectively collaborate, they need a common language.
A Generalization of Convolutional Neural Networks to Graph-Structured Data
Hechtlinger, Yotam, Chakravarti, Purvasha, Qin, Jining
This paper introduces a generalization of Convolutional Neural Networks (CNNs) from low-dimensional grid data, such as images, to graph-structured data. We propose a novel spatial convolution utilizing a random walk to uncover the relations within the input, analogous to the way the standard convolution uses the spatial neighborhood of a pixel on the grid. The convolution has an intuitive interpretation, is efficient and scalable and can also be used on data with varying graph structure. Furthermore, this generalization can be applied to many standard regression or classification problems, by learning the the underlying graph. We empirically demonstrate the performance of the proposed CNN on MNIST, and challenge the state-of-the-art on Merck molecular activity data set.
Multimodal Word Distributions
Athiwaratkun, Ben, Wilson, Andrew Gordon
Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic information, and outperforms alternatives, such as word2vec skip-grams, and Gaussian embeddings, on benchmark datasets such as word similarity and entailment.
Limits of End-to-End Learning
End-to-end learning refers to training a possibly complex learning system by applying gradient-based learning to the system as a whole. End-to-end learning system is specifically designed so that all modules are differentiable. In effect, not only a central learning machine, but also all "peripheral" modules like representation learning and memory formation are covered by a holistic learning process. The power of end-to-end learning has been demonstrated on many tasks, like playing a whole array of Atari video games with a single architecture. While pushing for solutions to more challenging tasks, network architectures keep growing more and more complex. In this paper we ask the question whether and to what extent end-to-end learning is a future-proof technique in the sense of scaling to complex and diverse data processing architectures. We point out potential inefficiencies, and we argue in particular that end-to-end learning does not make optimal use of the modular design of present neural networks. Our surprisingly simple experiments demonstrate these inefficiencies, up to the complete breakdown of learning.
Exploiting random projections and sparsity with random forests and gradient boosting methods -- Application to multi-label and multi-output learning, random forest model compression and leveraging input sparsity
Within machine learning, the supervised learning field aims at modeling the input-output relationship of a system, from past observations of its behavior. Decision trees characterize the input-output relationship through a series of nested $if-then-else$ questions, the testing nodes, leading to a set of predictions, the leaf nodes. Several of such trees are often combined together for state-of-the-art performance: random forest ensembles average the predictions of randomized decision trees trained independently in parallel, while tree boosting ensembles train decision trees sequentially to refine the predictions made by the previous ones. The emergence of new applications requires scalable supervised learning algorithms in terms of computational power and memory space with respect to the number of inputs, outputs, and observations without sacrificing accuracy. In this thesis, we identify three main areas where decision tree methods could be improved for which we provide and evaluate original algorithmic solutions: (i) learning over high dimensional output spaces, (ii) learning with large sample datasets and stringent memory constraints at prediction time and (iii) learning over high dimensional sparse input spaces.
Multi-Task Learning of Keyphrase Boundary Classification
Augenstein, Isabelle, Søgaard, Anders
Keyphrase boundary classification (KBC) is the task of detecting keyphrases in scientific articles and labelling them with respect to predefined types. Although important in practice, this task is so far underexplored, partly due to the lack of labelled data. To overcome this, we explore several auxiliary tasks, including semantic super-sense tagging and identification of multi-word expressions, and cast the task as a multi-task learning problem with deep recurrent neural networks. Our multi-task models perform significantly better than previous state of the art approaches on two scientific KBC datasets, particularly for long keyphrases.
Failures of Gradient-Based Deep Learning
Shalev-Shwartz, Shai, Shamir, Ohad, Shammah, Shaked
The success stories of deep learning form an ever lengthening list of practical breakthroughs and state-ofthe-art performances, ranging the fields of computer vision [23, 14, 25, 33], audio and natural language processing and generation [5, 15, 11, 34], as well as robotics [24, 26], to name just a few. The list of success stories can be matched and surpassed by a list of practical "tips and tricks", from different optimization algorithms, parameter tuning methods [30, 22], initialization schemes [10], architecture designs [31], loss functions, data augmentation [23] and so on. The current theoretical understanding of deep learning is far from being sufficient for a rigorous analysis of the difficulties faced by practitioners. Progress must be made from both parties: from a practitioner's perspective, emphasizing the difficulties provides practical insights to the theoretician, which in turn, supplies theoretical insights and guarantees, further strengthening and sharpening practical intuitions and wisdom. In particular, understanding failures of existing algorithms is as important as understanding where they succeed. Our goal in this paper is to present and discuss families of simple problems for which commonly used methods do not show as exceptional a performance as one might expect. We use empirical results and insights as a ground on which to build a theoretical analysis, characterising the sources of failure. Those understandings are aligned, and sometimes lead to, different approaches, either for an architecture, loss function, or an optimization scheme, and explain their superiority when applied to members of those families. Interestingly, the sources for failure in our experiment do not seem to relate to stationary point issues such as spurious local minima or a plethora of saddle points, a topic of much recent interest (e.g.