Oceania
Loss Surface Modality of Feed-Forward Neural Network Architectures
Bosman, Anna Sergeevna, Engelbrecht, Andries, Helbig, Mardé
It has been argued in the past that high-dimensional neural networks do not exhibit local minima capable of trapping an optimisation algorithm. However, the relationship between loss surface modality and the neural architecture parameters, such as the number of hidden neurons per layer and the number of hidden layers, remains poorly understood. This study employs fitness landscape analysis to study the modality of neural network loss surfaces under various feed-forward architecture settings. An increase in the problem dimensionality is shown to yield a more searchable and more exploitable loss surface. An increase in the hidden layer width is shown to effectively reduce the number of local minima, and simplify the shape of the global attractor. An increase in the architecture depth is shown to sharpen the global attractor, thus making it more exploitable.
Subspace Detours: Building Transport Plans that are Optimal on Subspace Projections
Muzellec, Boris, Cuturi, Marco
Sliced Wasserstein metrics between probability measures solve the optimal transport (OT) problem on univariate projections, and average such maps across projections. The recent interest for the SW distance shows that much can be gained by looking at optimal maps between measures in smaller subspaces, as opposed to the curse-of-dimensionality price one has to pay in higher dimensions. Any transport estimated in a subspace remains, however, an object that can only be used in that subspace. We propose in this work two methods to extrapolate, from an transport map that is optimal on a subspace, one that is nearly optimal in the entire space. We prove that the best optimal transport plan that takes such "subspace detours" is a generalization of the Knothe-Rosenblatt transport. We show that these plans can be explicitly formulated when comparing Gaussians measures (between which the Wasserstein distance is usually referred to as the Bures or Fr\'echet distance). Building from there, we provide an algorithm to select optimal subspaces given pairs of Gaussian measures, and study scenarios in which that mediating subspace can be selected using prior information. We consider applications to NLP and evaluation of image quality (FID scores).
Historic Global AI Agreement Achieved by OECD
Today global history was made, as the first intergovernmental standard on artificial intelligence (AI) was adopted by the OECD--a geopolitical milestone achievement. There is a worldwide investment rush underway in artificial intelligence (AI) technology. Both public and private investment funding are pouring into AI, as nations and corporations seek to gain economic benefits and competitive advantages through automation. IDC estimates the global spending on cognitive and AI systems to reach $57.6 billion by 2021. Last year the UK government announced plans to invest £300 million in AI.
Butterfly: A Panacea for All Difficulties in Wildly Unsupervised Domain Adaptation
Liu, Feng, Lu, Jie, Han, Bo, Niu, Gang, Zhang, Guangquan, Sugiyama, Masashi
In unsupervised domain adaptation (UDA), classifiers for the target domain (TD) are trained with clean labeled data from the source domain (SD) and unlabeled data from TD. However, in the wild, it is hard to acquire a large amount of perfectly clean labeled data in SD given limited budget. Hence, we consider a new, more realistic and more challenging problem setting, where classifiers have to be trained with noisy labeled data from SD and unlabeled data from TD---we name it wildly UDA (WUDA). We show that WUDA provably ruins all UDA methods if taking no care of label noise in SD, and to this end, we propose a Butterfly framework, a panacea for all difficulties in WUDA. Butterfly maintains four models (e.g., deep networks) simultaneously, where two take care of all adaptations (i.e., noisy-to-clean, labeled-to-unlabeled, and SD-to-TD-distributional) and then the other two can focus on classification in TD. As a consequence, Butterfly possesses all the necessary components for all the challenges in WUDA. Experiments demonstrate that under WUDA, Butterfly significantly outperforms existing baseline methods.
Multi-hop Reading Comprehension via Deep Reinforcement Learning based Document Traversal
Long, Alex, Mason, Joel, Blair, Alan, Wang, Wei
Reading Comprehension has received significant attention in recent years as high quality Question Answering (QA) datasets have become available. Despite state-of-the-art methods achieving strong overall accuracy, Multi-Hop (MH) reasoning remains particularly challenging. To address MH-QA specifically, we propose a Deep Reinforcement Learning based method capable of learning sequential reasoning across large collections of documents so as to pass a query-aware, fixed-size context subset to existing models for answer extraction. Our method is comprised of two stages: a linker, which decomposes the provided support documents into a graph of sentences, and an extractor, which learns where to look based on the current question and already-visited sentences. The result of the linker is a novel graph structure at the sentence level that preserves logical flow while still allowing rapid movement between documents. Importantly, we demonstrate that the sparsity of the resultant graph is invariant to context size. This translates to fewer decisions required from the Deep-RL trained extractor, allowing the system to scale effectively to large collections of documents. The importance of sequential decision making in the document traversal step is demonstrated by comparison to standard IE methods, and we additionally introduce a BM25-based IR baseline that retrieves documents relevant to the query only. We examine the integration of our method with existing models on the recently proposed QAngaroo benchmark and achieve consistent increases in accuracy across the board, as well as a 2-3x reduction in training time.
KNG: The K-Norm Gradient Mechanism
Reimherr, Matthew, Awan, Jordan
This paper presents a new mechanism for producing sanitized statistical summaries that achieve \emph{differential privacy}, called the \emph{K-Norm Gradient} Mechanism, or KNG. This new approach maintains the strong flexibility of the exponential mechanism, while achieving the powerful utility performance of objective perturbation. KNG starts with an inherent objective function (often an empirical risk), and promotes summaries that are close to minimizing the objective by weighting according to how far the gradient of the objective function is from zero. Working with the gradient instead of the original objective function allows for additional flexibility as one can penalize using different norms. We show that, unlike the exponential mechanism, the noise added by KNG is asymptotically negligible compared to the statistical error for many problems. In addition to theoretical guarantees on privacy and utility, we confirm the utility of KNG empirically in the settings of linear and quantile regression through simulations.
Cooperative Automated Vehicles: a Review of Opportunities and Challenges in Socially Intelligent Vehicles Beyond Networking
The connected automated vehicle has been often touted as a technology that will become pervasive in society in the near future. One can view an automated vehicle as having Artificial Intelligence (AI) capabilities, being able to self-drive, sense its surroundings, recognise objects in its vicinity, and perform reasoning and decision-making. Rather than being stand alone, we examine the need for automated vehicles to cooperate and interact within their socio-cyber-physical environments, including the problems cooperation will solve, but also the issues and challenges. We review current work in cooperation for automated vehicles, based on selected examples from the literature. We conclude noting the need for the ability to behave cooperatively as a form of social-AI capability for automated vehicles, beyond sensing the immediate environment and beyond the underlying networking technology.
The Impact of AI on the Data Analyst - insideBIGDATA
In this special guest feature, Glen Rabie, CEO of Yellowfin, believes that while many analysts may fear they will be replaced by automation and AI, the role of the data analyst will increase in significance to the business and breadth of skills required. Yellowfin is an Analytics and Business Intelligence software company focused on helping businesses understand their data. Rabie is passionate about data and improving business performance through analytics. Prior to starting Yellowfin, he worked in various roles at National Australia Bank including senior e-business consultant and global manager of employee self-service. Rabie holds a Masters in Commerce from the University of Melbourne.
Properties and Extensions of Alternating Path Relevance - I
When proving theorems from large sets of logical assertions, it can be helpful to restrict the search for a proof to those assertions that are relevant, that is, closely related to the theorem in some sense. For example, in the Watson system, a large knowledge base must rapidly be searched for relevant facts. It is possible to define formal concepts of relevance for propositional and first-order logic. Various concepts of relevance have been defined for this, and some have yielded good results on large problems. We consider here in particular a concept based on alternating paths.We present efficient graph-based methods for computing alternating path relevance and give some results indicating its effectiveness. We also propose an alternating path based extension of this relevance method to DPLL with an improved time bound, and give other extensions to alternating path relevance intended to improve its performance.