Overview
Statistical embedding: Beyond principal components
Tjøstheim, Dag, Jullum, Martin, Løland, Anders
There has been an intense recent activity in embedding of very high dimensional and nonlinear data structures, much of it in the data science and machine learning literature. We survey this activity in four parts. In the first part we cover nonlinear methods such as principal curves, multidimensional scaling, local linear methods, ISOMAP, graph based methods and kernel based methods. The second part is concerned with topological embedding methods, in particular mapping topological properties into persistence diagrams. Another type of data sets with a tremendous growth is very high-dimensional network data. The task considered in part three is how to embed such data in a vector space of moderate dimension to make the data amenable to traditional techniques such as cluster and classification techniques. The final part of the survey deals with embedding in $\mathbb{R}^2$, which is visualization. Three methods are presented: $t$-SNE, UMAP and LargeVis based on methods in parts one, two and three, respectively. The methods are illustrated and compared on two simulated data sets; one consisting of a triple of noisy Ranunculoid curves, and one consisting of networks of increasing complexity and with two types of nodes.
Learning and Executing Re-usable Behaviour Trees from Natural Language Instruction
Suddrey, Gavin, Talbot, Ben, Maire, Frederic
Domestic and service robots have the potential to transform industries such as health care and small-scale manufacturing, as well as the homes in which we live. However, due to the overwhelming variety of tasks these robots will be expected to complete, providing generic out-of-the-box solutions that meet the needs of every possible user is clearly intractable. To address this problem, robots must therefore not only be capable of learning how to complete novel tasks at run-time, but the solutions to these tasks must also be informed by the needs of the user. In this paper we demonstrate how behaviour trees, a well established control architecture in the fields of gaming and robotics, can be used in conjunction with natural language instruction to provide a robust and modular control architecture for instructing autonomous agents to learn and perform novel complex tasks. We also show how behaviour trees generated using our approach can be generalised to novel scenarios, and can be re-used in future learning episodes to create increasingly complex behaviours. We validate this work against an existing corpus of natural language instructions, demonstrate the application of our approach on both a simulated robot solving a toy problem, as well as two distinct real-world robot platforms which, respectively, complete a block sorting scenario, and a patrol scenario.
Attention mechanisms and deep learning for machine vision: A survey of the state of the art
Hafiz, Abdul Mueed, Parah, Shabir Ahmad, Bhat, Rouf Ul Alam
With the advent of state of the art nature-inspired pure attention based models i.e. transformers, and their success in natural language processing (NLP), their extension to machine vision (MV) tasks was inevitable and much felt. Subsequently, vision transformers (ViTs) were introduced which are giving quite a challenge to the established deep learning based machine vision techniques. However, pure attention based models/architectures like transformers require huge data, large training times and large computational resources. Some recent works suggest that combinations of these two varied fields can prove to build systems which have the advantages of both these fields. Accordingly, this state of the art survey paper is introduced which hopefully will help readers get useful information about this interesting and potential research area. A gentle introduction to attention mechanisms is given, followed by a discussion of the popular attention based deep architectures. Subsequently, the major categories of the intersection of attention mechanisms and deep learning for machine vision (MV) based are discussed. Afterwards, the major algorithms, issues and trends within the scope of the paper are discussed.
Data-Driven Design-by-Analogy: State of the Art and Future Directions
Jiang, Shuo, Hu, Jie, Wood, Kristin L., Luo, Jianxi
Design-by-Analogy (DbA) is a design methodology, wherein new solutions are generated in a target domain based on inspiration drawn from a source domain through cross-domain analogical reasoning [1, 2, 3]. DbA is an active research area in engineering design and various methods and tools have been proposed to support the implement of its process [4, 5, 6, 7, 8]. Studies have shown that DbA can help designers mitigate design fixation [9] and improve design ideation outcomes [10]. Fig.1 presents an example of DbA applications [11]. This case aims to solve an engineering design problem: How might we rectify the loud sonic boom generated when trains travel at high speeds through tunnels in atmospheric conditions [11, 12]? For potential design solutions to this problem, engineers explored structures in other design fields than trains or in the nature that effectively "break" the sonic-boom effect. When looking into the nature, engineers discovered that kingfisher birds could slice through the air and dive into the water at extremely high speeds to catch prey while barely making a splash. By analogy, engineers re-designed the train's front-end nose to mimic the geometry of the kingfisher's beak. This analogical design reduced noise and eliminated tunnel booms.
Controllable Abstractive Dialogue Summarization with Sketch Supervision
Wu, Chien-Sheng, Liu, Linqing, Liu, Wenhao, Stenetorp, Pontus, Xiong, Caiming
In this paper, we aim to improve abstractive dialogue summarization quality and, at the same time, enable granularity control. Our model has two primary components and stages: 1) a two-stage generation strategy that generates a preliminary summary sketch serving as the basis for the final summary. This summary sketch provides a weakly supervised signal in the form of pseudo-labeled interrogative pronoun categories and key phrases extracted using a constituency parser. 2) A simple strategy to control the granularity of the final summary, in that our model can automatically determine or control the number of generated summary sentences for a given dialogue by predicting and highlighting different text spans from the source text. Our model achieves state-of-the-art performance on the largest dialogue summarization corpus SAMSum, with as high as 50.79 in ROUGE-L score. In addition, we conduct a case study and show competitive human evaluation results and controllability to human-annotated summaries.
A generalized framework for active learning reliability: survey and benchmark
Moustapha, M., Marelli, S., Sudret, B.
Active learning methods have recently surged in the literature due to their ability to solve complex structural reliability problems within an affordable computational cost. These methods are designed by adaptively building an inexpensive surrogate of the original limit-state function. Examples of such surrogates include Gaussian process models which have been adopted in many contributions, the most popular ones being the efficient global reliability analysis (EGRA) and the active Kriging Monte Carlo simulation (AK-MCS), two milestone contributions in the field. In this paper, we first conduct a survey of the recent literature, showing that most of the proposed methods actually span from modifying one or more aspects of the two aforementioned methods. We then propose a generalized modular framework to build on-the-fly efficient active learning strategies by combining the following four ingredients or modules: surrogate model, reliability estimation algorithm, learning function and stopping criterion. Using this framework, we devise 39 strategies for the solution of 20 reliability benchmark problems. The results of this extensive benchmark are analyzed under various criteria leading to a synthesized set of recommendations for practitioners. These may be refined with a priori knowledge about the feature of the problem to solve, i.e., dimensionality and magnitude of the failure probability. This benchmark has eventually highlighted the importance of using surrogates in conjunction with sophisticated reliability estimation algorithms as a way to enhance the efficiency of the latter.
Frequency Estimation in Data Streams: Learning the Optimal Hashing Scheme
Bertsimas, Dimitris, Digalakis, Vassilis Jr
We present a novel approach for the problem of frequency estimation in data streams that is based on optimization and machine learning. Contrary to state-of-the-art streaming frequency estimation algorithms, which heavily rely on random hashing to maintain the frequency distribution of the data steam using limited storage, the proposed approach exploits an observed stream prefix to near-optimally hash elements and compress the target frequency distribution. We develop an exact mixed-integer linear optimization formulation, which enables us to compute optimal or near-optimal hashing schemes for elements seen in the observed stream prefix; then, we use machine learning to hash unseen elements. Further, we develop an efficient block coordinate descent algorithm, which, as we empirically show, produces high quality solutions, and, in a special case, we are able to solve the proposed formulation exactly in linear time using dynamic programming. We empirically evaluate the proposed approach both on synthetic datasets and on real-world search query data. We show that the proposed approach outperforms existing approaches by one to two orders of magnitude in terms of its average (per element) estimation error and by 45-90% in terms of its expected magnitude of estimation error.
Weakly Supervised Learning Creates a Fusion of Modeling Cultures
Tang, Chengliang, Yuan, Gan, Zheng, Tian
The past two decades have witnessed the great success of the algorithmic modeling framework advocated by Breiman et al. (2001). Nevertheless, the excellent prediction performance of these black-box models rely heavily on the availability of strong supervision, i.e. a large set of accurate and exact ground-truth labels. In practice, strong supervision can be unavailable or expensive, which calls for modeling techniques under weak supervision. In this comment, we summarize the key concepts in weakly supervised learning and discuss some recent developments in the field. Using algorithmic modeling alone under a weak supervision might lead to unstable and misleading results. A promising direction would be integrating the data modeling culture into such a framework.
Teaching Machine Learning in K-12 Computing Education: Potential and Pitfalls
Tedre, Matti, Toivonen, Tapani, Kaihila, Juho, Vartiainen, Henriikka, Valtonen, Teemu, Jormanainen, Ilkka, Pears, Arnold
Over the past decades, numerous practical applications of machine learning techniques have shown the potential of data-driven approaches in a large number of computing fields. Machine learning is increasingly included in computing curricula in higher education, and a quickly growing number of initiatives are expanding it in K-12 computing education, too. As machine learning enters K-12 computing education, understanding how intuition and agency in the context of such systems is developed becomes a key research area. But as schools and teachers are already struggling with integrating traditional computational thinking and traditional artificial intelligence into school curricula, understanding the challenges behind teaching machine learning in K-12 is an even more daunting challenge for computing education research. Despite the central position of machine learning in the field of modern computing, the computing education research body of literature contains remarkably few studies of how people learn to train, test, improve, and deploy machine learning systems. This is especially true of the K-12 curriculum space. This article charts the emerging trajectories in educational practice, theory, and technology related to teaching machine learning in K-12 education. The article situates the existing work in the context of computing education in general, and describes some differences that K-12 computing educators should take into account when facing this challenge. The article focuses on key aspects of the paradigm shift that will be required in order to successfully integrate machine learning into the broader K-12 computing curricula. A crucial step is abandoning the belief that rule-based "traditional" programming is a central aspect and building block in developing next generation computational thinking.
Optimization of Heterogeneous Systems with AI Planning Heuristics and Machine Learning: A Performance and Energy Aware Approach
Heterogeneous computing systems provide high performance and energy efficiency. However, to optimally utilize such systems, solutions that distribute the work across host CPUs and accelerating devices are needed. In this paper, we present a performance and energy aware approach that combines AI planning heuristics for parameter space exploration with a machine learning model for performance and energy evaluation to determine a near-optimal system configuration. For data-parallel applications our approach determines a near-optimal host-device distribution of work, number of processing units required and the corresponding scheduling strategy. We evaluate our approach for various heterogeneous systems accelerated with GPU or the Intel Xeon Phi. The experimental results demonstrate that our approach finds a near-optimal system configuration by evaluating only about 7% of reasonable configurations. Furthermore, the performance per Joule estimation of system configurations using our machine learning model is more than 1000x faster compared to the system evaluation by program execution.