Oceania
Generalisation in fully-connected neural networks for time series forecasting
Borovykh, Anastasia, Oosterlee, Cornelis W., Bohte, Sander M.
In this paper we study the generalisation capabilities of fully-connected neural networks trained in the context of time series forecasting. Time series do not satisfy the typical assumption in statistical learning theory of the data being i.i.d. samples from some data-generating distribution. We use the input and weight Hessians, that is the smoothness of the learned function with respect to the input and the width of the minimum in weight space, to quantify a network's ability to generalise to unseen data. While such generalisation metrics have been studied extensively in the i.i.d. setting of for example image recognition, here we empirically validate their use in the task of time series forecasting. Furthermore we discuss how one can control the generalisation capability of the network by means of the training process using the learning rate, batch size and the number of training iterations as controls. Using these hyperparameters one can efficiently control the complexity of the output function without imposing explicit constraints.
Do ImageNet Classifiers Generalize to ImageNet?
Recht, Benjamin, Roelofs, Rebecca, Schmidt, Ludwig, Shankar, Vaishaal
We build new test sets for the CIFAR-10 and ImageNet datasets. Both benchmarks have been the focus of intense research for almost a decade, raising the danger of overfitting to excessively re-used test sets. By closely following the original dataset creation processes, we test to what extent current classification models generalize to new data. We evaluate a broad range of models and find accuracy drops of 3% - 15% on CIFAR-10 and 11% - 14% on ImageNet. However, accuracy gains on the original test sets translate to larger gains on the new test sets. Our results suggest that the accuracy drops are not caused by adaptivity, but by the models' inability to generalize to slightly "harder" images than those found in the original test sets.
Off-Policy Actor-Critic in an Ensemble: Achieving Maximum General Entropy and Effective Environment Exploration in Deep Reinforcement Learning
We propose a new policy iteration theory as an important extension of soft policy iteration and Soft Actor-Critic (SAC), one of the most efficient model free algorithms for deep reinforcement learning. Supported by the new theory, arbitrary entropy measures that generalize Shannon entropy, such as Tsallis entropy and Renyi entropy, can be utilized to properly randomize action selection while fulfilling the goal of maximizing expected long-term rewards. Our theory gives birth to two new algorithms, i.e., Tsallis entropy Actor-Critic (TAC) and Renyi entropy Actor-Critic (RAC). Theoretical analysis shows that these algorithms can be more effective than SAC. Moreover, they pave the way for us to develop a new Ensemble Actor-Critic (EAC) algorithm in this paper that features the use of a bootstrap mechanism for deep environment exploration as well as a new value-function based mechanism for high-level action selection. Empirically we show that TAC, RAC and EAC can achieve state-of-the-art performance on a range of benchmark control tasks, outperforming SAC and several cutting-edge learning algorithms in terms of both sample efficiency and effectiveness.
On Many-to-Many Mapping Between Concordance Correlation Coefficient and Mean Square Error
Pandit, Vedhas, Schuller, Björn
The concordance correlation coefficient (CCC) is one of the most widely used reproducibility indices, introduced by Lin in 1989. In addition to its extensive use in assay validation, CCC serves various different purposes in other multivariate population-related tasks. For example, it is often used as a metric to quantify an inter-rater agreement. It is also often used as a performance metric for prediction problems. In terms of the cost function, however, there has been hardly any attempt to design one to train the predictive deep learning models. In this paper, we present a family of lightweight cost functions that aim to also maximise CCC, when minimising the prediction errors. To this end, we first reformulate CCC in terms of the errors in the prediction; and then as a logical next step, in terms of the sequence of the fixed set of errors. To elucidate our motivation and the results we obtain through these error rearrangements, the data we use is the set of gold standard annotations from a well-known database called `Automatic Sentiment Analysis in the Wild' (SEWA), popular thanks to its use in the latest Audio/Visual Emotion Challenges (\textsc{AVEC'17} and \textsc{AVEC'18}). We also present some new and interesting mathematical paradoxes we have discovered through this CCC reformulation endeavour.
Distributed Online Linear Regression
Yuan, Deming, Proutiere, Alexandre, Shi, Guodong
We study online linear regression problems in a distributed setting, where the data is spread over a network. In each round, each network node proposes a linear predictor, with the objective of fitting the \emph{network-wide} data. It then updates its predictor for the next round according to the received local feedback and information received from neighboring nodes. The predictions made at a given node are assessed through the notion of regret, defined as the difference between their cumulative network-wide square errors and those of the best off-line network-wide linear predictor. Various scenarios are investigated, depending on the nature of the local feedback (full information or bandit feedback), on the set of available predictors (the decision set), and the way data is generated (by an oblivious or adaptive adversary). We propose simple and natural distributed regression algorithms, involving, at each node and in each round, a local gradient descent step and a communication and averaging step where nodes aim at aligning their predictors to those of their neighbors. We establish regret upper bounds typically in ${\cal O}(T^{3/4})$ when the decision set is unbounded and in ${\cal O}(\sqrt{T})$ in case of bounded decision set.
Learning and Generalization for Matching Problems
Cohen, Alon, Hassidim, Avinatan, Kaplan, Haim, Mansour, Yishay, Moran, Shay
We study a classic algorithmic problem through the lens of statistical learning. That is, we consider a matching problem where the input graph is sampled from some distribution. This distribution is unknown to the algorithm; however, an additional graph which is sampled from the same distribution is given during a training phase (preprocessing). More specifically, the algorithmic problem is to match $k$ out of $n$ items that arrive online to $d$ categories ($d\ll k \ll n$). Our goal is to design a two-stage online algorithm that retains a small subset of items in the first stage which contains an offline matching of maximum weight. We then compute this optimal matching in a second stage. The added statistical component is that before the online matching process begins, our algorithms learn from a training set consisting of another matching instance drawn from the same unknown distribution. Using this training set, we learn a policy that we apply during the online matching process. We consider a class of online policies that we term \emph{thresholds policies}. For this class, we derive uniform convergence results both for the number of retained items and the value of the optimal matching. We show that the number of retained items and the value of the offline optimal matching deviate from their expectation by $O(\sqrt{k})$. This requires usage of less-standard concentration inequalities (standard ones give deviations of $O(\sqrt{n})$). Furthermore, we design an algorithm that outputs the optimal offline solution with high probability while retaining only $O(k\log \log n)$ items in expectation.
Sonar drone discovers long-lost WWII aircraft carrier USS Hornet
The late Paul Allen's research vessel, the Petrel, has found another historic warship at the bottom of the ocean. In the wake of an initial discovery in late January, the expedition crew has confirmed that it found the USS Hornet, an aircraft carrier that played a pivotal role in WWII through moments like the Doolittle Raid on Japan and the pivotal Battle of Midway. It was considered lost when it sank at the Battle of Santa Cruz in October 1943, but modern technology spotted it nearly 17,500 feet below the surface of the South Pacific Ocean, near the Solomon Islands. The team initially narrowed down its search area by using data from the era, such as action reports and deck logs from other ships involved in the Santa Cruz fight. From there, tech took over.
Stable multi-instance learning visa causal inference
Multi-instance learning (MIL) deals with tasks where each example is represented by a bag of instances. Unlike traditional supervised learning, only the bag labels are observed whereas the label for each instance in the bags is not available. Previous MIL studies typically assume that training and the test data follow the same distribution, which is often violated in real-world applications. Existing methods address distribution changes by reweighting the training bags with the density ratio between the test and the training data. However, models are frequently trained without prior knowledge of the testing distribution which renders existing methods ineffective. In this paper, we propose a novel multi-instance learning algorithm which links MIL with causal inference to achieve stable prediction without knowing the distribution of the test dataset. Experimental results show that the performance of our approach is stable to the distribution changes.
When machine learning, Twitter and te reo Maori merge - UoW
Researchers have whittled down a massive 8 million tweets, to a more manageable 1.2 million to look at how te reo MÄ ori is being used in the genre. The team from the University of Waikato have focused on 77 MÄ ori loanwords (te reo MÄ ori words used in an English context) and used them as training data for their machine-learning model. Machine learning allows data scientists to provide a computer with a large data set, and teach it to make predictions based on that data. Computing and Mathematical Sciences student David Trye spent the summer working on the project, with supervisorsDr Andreea Calude and Dr Felipe Bravo Márquez. The initial 8-million tweets contained a fair bit of distracting data'noise'.
A Machine Learning based Robust Prediction Model for Real-life Mobile Phone Data
Real-life mobile phone data may contain noisy instances, which is a fundamental issue for building a prediction model with many potential negative consequences. The complexity of the inferred model may increase, may arise overfitting problem, and thereby the overall prediction accuracy of the model may decrease. In this paper, we address these issues and present a robust prediction model for real-life mobile phone data of individual users, in order to improve the prediction accuracy of the model. In our robust model, we first effectively identify and eliminate the noisy instances from the training dataset by determining a dynamic noise threshold using naive Bayes classifier and laplace estimator, which may differ from user-to-user according to their unique behavioral patterns. After that, we employ the most popular rule-based machine learning classification technique, i.e., decision tree, on the noise-free quality dataset to build the prediction model. Experimental results on the real-life mobile phone datasets (e.g., phone call log) of individual mobile phone users, show the effectiveness of our robust model in terms of precision, recall and f-measure.