Oceania
Escaping Saddle Points with Adaptive Gradient Methods
Staib, Matthew, Reddi, Sashank J., Kale, Satyen, Kumar, Sanjiv, Sra, Suvrit
Adaptive methods such as Adam and RMSProp are widely used in deep learning but are not well understood. In this paper, we seek a crisp, clean and precise characterization of their behavior in nonconvex settings. To this end, we first provide a novel view of adaptive methods as preconditioned SGD, where the preconditioner is estimated in an online manner. By studying the preconditioner on its own, we elucidate its purpose: it rescales the stochastic gradient noise to be isotropic near stationary points, which helps escape saddle points. Furthermore, we show that adaptive methods can efficiently estimate the aforementioned preconditioner. By gluing together these two components, we provide the first (to our knowledge) second-order convergence result for any adaptive method. The key insight from our analysis is that, compared to SGD, adaptive methods escape saddle points faster, and can converge faster overall to second-order stationary points.
Weakly Supervised Active Learning with Cluster Annotation
Perez, Fábio, Lebret, Rémi, Aberer, Karl
In this work, we introduce a novel framework that employs cluster annotation to boost active learning by reducing the number of human interactions required to train deep neural networks. Instead of annotating single samples individually, humans can also label clusters, producing a higher number of annotated samples with the cost of a small label error. Our experiments show that the proposed framework requires 82% and 87% less human interactions for CIFAR-10 and EuroSAT datasets respectively when compared with the fully-supervised training while maintaining similar performance on the test set.
A Kalman filtering induced heuristic optimization based partitional data clustering
Pakrashi, Arjun, Chaudhuri, Bidyut B.
Clustering algorithms have regained momentum with recent popularity of data mining and knowledge discovery approaches. To obtain good clustering in reasonable amountof time, various meta-heuristic approaches and their hybridization, sometimes with K-Means technique, have been employed. A Kalman Filtering basedheuristic approach called Heuristic Kalman Algorithm (HKA) has been proposed a few years ago, which may be used for optimizing an objective functionin data/feature space. In this paper at first HKA is employed in partitional data clustering. Then an improved approach named HKA-K is proposed, whichcombines the benefits of global exploration of HKA and the fast convergence of K-Means method. Implemented and tested on several datasets from UCI machine learning repository, the results obtained by HKA-K were compared with other hybrid meta-heuristic clustering approaches. It is shown that HKA-K is atleast as good as and often better than the other compared algorithms. Keywords:Clustering, K-Means, Optimization, Metaheuristic Optimization, Heuristics 1. Introduction Clustering is the process of assigning a set of n data points into C classes based on the similarity between the data points in the feature space. It is useful when some prototype data from known classes are not available for training a supervised classifier or for an exploratory data analysis task. It is one of the earliest pattern classification approaches and has found renewed interest since the beginning of data mining and big data analytics.
Deep Learning on Small Datasets without Pre-Training using Cosine Loss
Two things seem to be indisputable in the contemporary deep learning discourse: 1. The categorical cross-entropy loss after softmax activation is the method of choice for classification. 2. Training a CNN classifier from scratch on small datasets does not work well. In contrast to this, we show that the cosine loss function provides significantly better performance than cross-entropy on datasets with only a handful of samples per class. For example, the accuracy achieved on the CUB-200-2011 dataset without pre-training is by 30% higher than with the cross-entropy loss. Further experiments on four other popular datasets confirm our findings. Moreover, we show that the classification performance can be improved further by integrating prior knowledge in the form of class hierarchies, which is straightforward with the cosine loss.
Faster Boosting with Smaller Memory
Alafate, Julaiti, Freund, Yoav
The two state-of-the-art implementations of boosted trees: XGBoost and LightGBM, can process large training sets extremely fast. However, this performance requires that memory size is sufficient to hold a 2-3 multiple of the training set size. This paper presents an alternative approach to implementing boosted trees. which achieves a significant speedup over XGBoost and LightGBM, especially when memory size is small. This is achieved using a combination of two techniques: early stopping and stratified sampling, which are explained and analyzed in the paper. We describe our implementation and present experimental results to support our claims.
State-Regularized Recurrent Neural Networks
Recurrent neural networks are a widely used class of neural architectures. They have, however, two shortcomings. First, it is difficult to understand what exactly they learn. Second, they tend to work poorly on sequences requiring long-term memorization, despite having this capacity in principle. We aim to address both shortcomings with a class of recurrent networks that use a stochastic state transition mechanism between cell applications. This mechanism, which we term state-regularization, makes RNNs transition between a finite set of learnable states. We evaluate state-regularized RNNs on (1) regular languages for the purpose of automata extraction; (2) nonregular languages such as balanced parentheses, palindromes, and the copy task where external memory is required; and (3) real-word sequence learning tasks for sentiment analysis, visual object recognition, and language modeling. We show that state-regularization (a) simplifies the extraction of finite state automata modeling an RNN's state transition dynamics; (b) forces RNNs to operate more like automata with external memory and less like finite state machines; (c) makes RNNs have better interpretability and explainability.
Quantitative Research, Machine Learning Data Scientist
Susquehanna International Group is a global quantitative trading firm founded with an entrepreneurial mindset and a rigorous analytical approach to decision making. We are currently seeking an extraordinary Machine Learning Data Scientist to join our Statistical Arbitrage team in Sydney. Contribute techniques from machine learning towards improved automated trading systems. You will work with traders, quants, software developers in a highly focused team building, deploying and improving automated trading systems. This team researches, implements, tests, deploys and monitors proprietary systems trading across multiple markets and financial products.
How Pharmas Are Missing Out on AI for Sales
Artificial intelligence within CRM still has to make inroads with the pharmaceutical industry. That's a surprising conclusion given how much this sector is embracing AI for other functions. Yet some market analysts are calling this a "white space" for the industry. Pharma companies are already incorporating AI in their drug delivery, digital diagnostics and precision medicine programs -- areas where its usage has become highly sophisticated. The industry clearly recognizes that AI can be a massive value add.
Is AI the Next Frontier for National Competitive Advantage?
Artificial intelligence (AI) presents limitless opportunity, but not without potential pitfalls and risks. This paradox has become increasingly evident for government leaders. They want to give domestic companies an edge over the competition, but are also expected to protect their citizens and use AI for social good. They want to support innovation, while still maintaining some level of control over how new technologies impact society at large. With a huge payoff on the line -- by our own estimates, AI has the potential to increase worldwide GDP by 14 percent by 2030, an infusion of US$15.7 trillion into the global economy -- it should come as no surprise that governments are eager to claim their share.
To deploy AI tech, healthcare needs to first be data literate
Artificial intelligence (AI) has disrupted numerous industries in recent years, but for the technology to work effectively, the technology needs to be used right. For the healthcare sector, one of the end goals is to provide better patient outcomes, minimise human errors and alleviate some of the physical and mental burnout felt by healthcare practitioners as a result of the volume of admin work required. A study in the US found that for every hour that physicians spend providing direct clinical facetime to patients, almost two additional hours are spent on desk work. By utilising AI and analytics, this can be reduced, and by extension, so too will the rates of mental illness. For this to happen, however, the industry must first get ready for the AI era by building up skills in reading, working with, analysing and arguing with data – also known as data literacy. Data is the lifeblood of AI; which is what makes AI and analytics the ideal combination.