Oceania
Learning Representations of Ultrahigh-dimensional Data for Random Distance-based Outlier Detection
Pang, Guansong, Cao, Longbing, Chen, Ling, Liu, Huan
Learning expressive low-dimensional representations of ultrahigh-dimensional data, e.g., data with thousands/millions of features, has been a major way to enable learning methods to address the curse of dimensionality. However, existing unsupervised representation learning methods mainly focus on preserving the data regularity information and learning the representations independently of subsequent outlier detection methods, which can result in suboptimal and unstable performance of detecting irregularities (i.e., outliers). This paper introduces a ranking model-based framework, called RAMODO, to address this issue. RAMODO unifies representation learning and outlier detection to learn low-dimensional representations that are tailored for a state-of-the-art outlier detection approach - the random distance-based approach. This customized learning yields more optimal and stable representations for the targeted outlier detectors. Additionally, RAMODO can leverage little labeled data as prior knowledge to learn more expressive and application-relevant representations. We instantiate RAMODO to an efficient method called REPEN to demonstrate the performance of RAMODO. Extensive empirical results on eight real-world ultrahigh dimensional data sets show that REPEN (i) enables a random distance-based detector to obtain significantly better AUC performance and two orders of magnitude speedup; (ii) performs substantially better and more stably than four state-of-the-art representation learning methods; and (iii) leverages less than 1% labeled data to achieve up to 32% AUC improvement.
kjaisingh/high-school-guide-to-machine-learning
Being a high schooler myself and having studied Machine Learning and Artificial Intelligence for a year now, I believe that there fails to exist a learning path in this field for High School students. This is my attempt to create one. Over the past few months, I've tried to spend a couple of hours every day understanding this field, be it watching Youtube videos or undertaking projects. I've been guided by older peers who've had far more experience than me, and now feel that I have ample experience to share my insights. All the information that I have compiled in this guide is intended for high schoolers wishing to excel in this up and coming field.
MITx MicroMasters Program in Statistics and Data Science opens enrollment
The new MITx MicroMasters Program in Statistics and Data Science, which opened for enrollment today, will help online learners develop their skills in the booming field of data science. The program offers learners an MIT-quality, professional credential, while also providing an academic pathway to pursue a PhD at MIT or a master's degree elsewhere. "There are many online programs that provide a professional overview of data science, but they don't offer the level of detail learners gain from an actual, residential master's program," says Professor Devavrat Shah, faculty director of the program and MIT professor in the Department of Electrical Engineering and Computer Science (EECS). "This new MicroMasters program in Statistics and Data Science is bringing the quality, rigor, and structure of a master's-level, residential program in data science at MIT to a wider audience around the world, and at a very accessible price, so people can learn anywhere they are while keeping their day jobs." In all, seven universities will be accepting the new MicroMasters Statistics and Data Science (SDS) credential towards a master's degree, including the Rochester Institute of Technology (United States), Doane University (United States), Galileo University (Guatemala), Reykjavik University (Iceland), Curtin University (Australia), Deakin University (Australia), and RMIT University (Australia).
Google Prepares Google Home Multi-Language Support For Multiple Markets
Google is planning to bring support for more languages to its Google Assistant AI, so that its Google Home speaker could appeal to more consumers. Digitimes Research reported late last week that Google is planning to push sales of its Google Home speakers to more markets this 2018. With this in mind, the search engine giant is aiming to bring support for up to 30 languages to its voice assistant. Amazon's Alexa only supports three languages at present. To take advantage of the situation, Google wants to enable Google Assistant to support up to 30 languages with the use of its deep research in multi-language and semantics.
'Artificial intelligence, machine learning can help improve crop yields'
He said the company had made big strides in the country in terms of enterprises adopting its technologies such as cloud services, security, artificial intelligence and machine learning. How are Indian enterprises adopting your technologies, especially cloud and artificial intelligence? How large is the opportunity? Globally... only about 5%-10% of all workloads in IT run on the cloud. I think the estimates are quite conservative.
Auto-Meta: Automated Gradient Based Meta Learner Search
Kim, Jaehong, Choi, Youngduck, Cha, Moonsu, Lee, Jung Kwon, Lee, Sangyeul, Kim, Sungwan, Choi, Yongseok, Kim, Jiwon
Fully automating machine learning pipeline is one of the outstanding challenges of general artificial intelligence, as practical machine learning often requires costly human driven process, such as hyper-parameter tuning, algorithmic selection, and model selection. In this work, we consider the problem of executing automated, yet scalable search for finding optimal gradient based meta-learners in practice. As a solution, we apply progressive neural architecture search to proto-architectures by appealing to the model agnostic nature of general gradient based meta learners. In the presence of recent universality result of Finn \textit{et al.}\cite{finn:universality_maml:DBLP:/journals/corr/abs-1710-11622}, our search is a priori motivated in that neural network architecture search dynamics---automated or not---may be quite different from that of the classical setting with the same target tasks, due to the presence of the gradient update operator. A posteriori, our search algorithm, given appropriately designed search spaces, finds gradient based meta learners with non-intuitive proto-architectures that are narrowly deep, unlike the inception-like structures previously observed in the resulting architectures of traditional NAS algorithms. Along with these notable findings, the searched gradient based meta-learner achieves state-of-the-art results on the few shot classification problem on Mini-ImageNet with $76.29\%$ accuracy, which is an $13.18\%$ improvement over results reported in the original MAML paper. To our best knowledge, this work is the first successful AutoML implementation in the context of meta learning.
A Note about: Local Explanation Methods for Deep Neural Networks lack Sensitivity to Parameter Values
Sundararajan, Mukund, Taly, Ankur
Local explanation methods, also known as attribution methods, attribute a deep network's prediction to its input (cf. Baehrens et al. (2010)). We respond to the claim from Adebayo et al. (2018) that local explanation methods lack sensitivity, i.e., DNNs with randomly-initialized weights produce explanations that are both visually and quantitatively similar to those produced by DNNs with learned weights. Further investigation reveals that their findings are due to two choices in their analysis: (a) ignoring the signs of the attributions; and (b) for integrated gradients (IG), including pixels in their analysis that have zero attributions by choice of the baseline (an auxiliary input relative to which the attributions are computed). When both factors are accounted for, IG attributions for a random network and the actual network are uncorrelated. Our investigation also sheds light on how these issues affect visualizations, although we note that more work is needed to understand how viewers interpret the difference between the random and the actual attributions.
Diverse Online Feature Selection
Siu, Chapman, Da Xu, Richard Yi
Online feature selection has been an active research area in recent years. We propose a novel diverse online feature selection method based on Determinantal Point Processes (DPP). Our model aims to provide diverse features which can be composed in either a supervised or unsupervised framework. The framework aims to promote diversity based on the kernel produced on a feature level, through at most three stages: feature sampling, local criteria and global criteria for feature selection. In the feature sampling, we sample incoming stream of features using conditional DPP. The local criteria is used to assess and select streamed features (i.e. only when they arrive), we use unsupervised scale invariant methods to remove redundant features and optionally supervised methods to introduce label information to assess relevant features. Lastly, the global criteria uses regularization methods to select a global optimal subset of features. This three stage procedure continues until there are no more features arriving or some predefined stopping condition is met. We demonstrate based on experiments conducted on that this approach yields better compactness, is comparable and in some instances outperforms other state-of-the-art online feature selection methods.
Fast Approximate Natural Gradient Descent in a Kronecker-factored Eigenbasis
George, Thomas, Laurent, César, Bouthillier, Xavier, Ballas, Nicolas, Vincent, Pascal
For models with many parameters, the covariance matrix they are based on becomes gigantic, making them inapplicable in their original form. This has motivated research into both simple diagonal approximations and more sophisticated factored approximations such as KFAC (Heskes, 2000; Martens & Grosse, 2015; Grosse & Martens, 2016). In the present work we draw inspiration from both to propose a novel approximation that is provably better than KFAC and amendable to cheap partial updates. It consists in tracking a diagonal variance, not in parameter coordinates, but in a Kronecker-factored eigenbasis, in which the diagonal approximation is likely to be more effective. Experiments show improvements over KFAC in optimization speed for several deep network architectures.