If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."
However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …
Thousands of engineers write the code to create our apps, which serve billions of people worldwide. This is no trivial task--our services have grown so diverse and complex that the codebase contains millions of lines of code that intersect with a wide variety of different systems, from messaging to image rendering. To simplify and speed the process of writing code that will make an impact on so many systems, engineers often want a way to find how someone else has handled a similar task. We created Aroma, a code-to-code search and recommendation tool that uses machine learning (ML) to make the process of gaining insights from big codebases much easier. Prior to Aroma, none of the existing tools fully addressed this problem.
I have written this post for the developers and assumes no background in statistics or mathematics. The focus is mainly on how the k-NN algorithm works and how to use it for predictive modeling problems. Classification of objects is an important area of research and application in a variety of fields. In the presence of full knowledge of the underlying probabilities, Bayes decision theory gives optimal error rates. In those cases where this information is not present, many algorithms make use of distance or similarity among samples as a means of classification.
As a human you instinctively know that a leopard is closer to a cat than a motorbike, but the way we train most AI makes them oblivious to these kinds of relations. Building the concept of similarity into our algorithms could make them far more capable, writes the author of a new paper in Science Robotics. Convolutional neural networks have revolutionized the field of computer vision to the point that machines are now outperforming humans on some of the most challenging visual tasks. But the way we train them to analyze images is very different from the way humans learn, says Atsuto Maki, an associate professor at KTH Royal Institute of Technology. "Imagine that you are two years old and being quizzed on what you see in a photo of a leopard," he writes.
There has been a lot of hype around AI to the point where some people are simply tuning it out. I think this is a mistake. While there are limits to what AI can do, there also are sophisticated attacks that we'd miss without it. The need for AI is driven by three fundamental yet significant changes in the enterprise computing environment. Taking all of these factors together leads me to believe that AI is not only a viable solution, but it may be the only solution.
Explanation methods aim to make neural networks more trustworthy and interpretable. In this paper, we demonstrate a property of explanation methods which is disconcerting for both of these purposes. Namely, we show that explanations can be manipulated arbitrarily by applying visually hardly perceptible perturbations to the input that keep the network's output approximately constant. We establish theoretically that this phenomenon can be related to certain geometrical properties of neural networks. This allows us to derive an upper bound on the susceptibility of explanations to manipulations. Based on this result, we propose effective mechanisms to enhance the robustness of explanations.
Humans and animals show remarkable flexibility in adjusting their behaviour when their goals, or rewards in the environment change. While such flexibility is a hallmark of intelligent behaviour, these multi-task scenarios remain an important challenge for machine learning algorithms and neurobiological models alike. Factored representations can enable flexible behaviour by abstracting away general aspects of a task from those prone to change, while nonparametric methods provide a principled way of using similarity to past experiences to guide current behaviour. Here we combine the successor representation (SR), that factors the value of actions into expected outcomes and corresponding rewards, with evaluating task similarity through nonparametric inference and clustering the space of rewards. The proposed algorithm improves SR's transfer capabilities by inverting a generative model over tasks, while also explaining important neurobiological signatures of place cell representation in the hippocampus. It dynamically samples from a flexible number of distinct SR maps while accumulating evidence about the current reward context, and outperforms competing algorithms in settings with both known and unsignalled rewards changes. It reproduces the "flickering" behaviour of hippocampal maps seen when rodents navigate to changing reward locations, and gives a quantitative account of trajectory-dependent hippocampal representations (so-called splitter cells) and their dynamics. We thus provide a novel algorithmic approach for multi-task learning, as well as a common normative framework that links together these different characteristics of the brain's spatial representation.
This work revisits the use of information criteria to characterize the generalization of deep learning models. In particular, we empirically demonstrate the effectiveness of the Takeuchi information criterion (TIC), an extension of the Akaike information criterion (AIC) for misspecified models, in estimating the generalization gap, shedding light on why quantities such as the number of parameters cannot quantify generalization. The TIC depends on both the Hessian of the loss H and the covariance of the gradients C. By exploring the similarities and differences between these two matrices as well as the Fisher information matrix F, we explore the interplay between noise and curvature in deep models. We also address the question of whether C is a reasonable approximation to F, as is commonly assumed.
The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested example and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures? This review attempts to answer the previous question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN degraded only about $20\%$ while the noise level reaches $90\%$, this is true for all the distances used. This means that the KNN classifier using any of the top $10$ distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.
Network Embedding (NE) methods, which map network nodes to low-dimensional feature vectors, have wide applications in network analysis and bioinformatics. Many existing NE methods rely only on network structure, overlooking other information associated with the nodes, e.g., text describing the nodes. Recent attempts to combine the two sources of information only consider local network structure. We extend NODE2VEC, a well-known NE method that considers broader network structure, to also consider textual node descriptors using recurrent neural encoders. Our method is evaluated on link prediction in two networks derived from UMLS. Experimental results demonstrate the effectiveness of the proposed approach compared to previous work.
Time series are ubiquitous in real world problems and computing distance between two time series is often required in several learning tasks. Computing similarity between time series by ignoring variations in speed or warping is often encountered and dynamic time warping (DTW) is the state of the art. However DTW is not applicable in algorithms which require kernel or vectors. In this paper, we propose a mechanism named WaRTEm to generate vector embeddings of time series such that distance measures in the embedding space exhibit resilience to warping. Therefore, WaRTEm is more widely applicable than DTW. WaRTEm is based on a twin auto-encoder architecture and a training strategy involving warping operators for generating warping resilient embeddings for time series datasets. We evaluate the performance of WaRTEm and observed more than $20\%$ improvement over DTW in multiple real-world datasets.