Country
Scaling Up Reinforcement Learning through Targeted Exploration
Mann, Timothy Arthur (Texas A&M University) | Choe, Yoonsuck (Texas A&M University)
Recent Reinforcement Learning (RL) algorithms, such as R-MAX, make (with high probability) only a small number of poor decisions. In practice, these algorithms do not scale well as the number of states grows because the algorithms spend too much effort exploring. We introduce an RL algorithm State TArgeted R-MAX (STAR-MAX) that explores a subset of the state space, called the exploration envelope ฮพ. When ฮพ equals the total state space, STAR-MAX behaves identically to R-MAX. When ฮพ is a subset of the state space, to keep exploration within ฮพ, a recovery rule ฮฒ is needed. We compared existing algorithms with our algorithm employing various exploration envelopes. With an appropriate choice of ฮพ, STAR-MAX scales far better than existing RL algorithms as the number of states increases. A possible drawback of our algorithm is its dependence on a good choice of ฮพ and ฮฒ. However, we show that an effective recovery rule ฮฒ can be learned on-line and ฮพ can be learned from demonstrations. We also find that even randomly sampled exploration envelopes can improve cumulative rewards compared to R-MAX. We expect these results to lead to more efficient methods for RL in large-scale problems.
Towards Evolutionary Nonnegative Matrix Factorization
Wang, Fei (IBM Research) | Tong, Hanghang (IBM Research) | Lin, Ching-Yung (IBM Research)
Nonnegative Matrix Factorization (NMF) techniques has aroused considerable interests from the field of artificial intelligence in recent years because of its good interpretability and computational efficiency. However, in many real world applications, the data features usually evolve over time smoothly. In this case, it would be very expensive in both computation and storage to rerun the whole NMF procedure after each time when the data feature changing. In this paper, we propose Evolutionary Nonnegative Matrix Factorization (eNMF), which aims to incrementally update the factorized matrices in a computation and space efficient manner with the variation of the data matrix. We devise such evolutionary procedure for both asymmetric and symmetric NMF. Finally we conduct experiments on several real world data sets to demonstrate the efficacy and efficiency of eNMF.
Efficient Subspace Segmentation via Quadratic Programming
Wang, Shusen (Zhejiang University) | Yuan, Xiaotong (National University of Singapore) | Yao, Tiansheng (Zhejiang University) | Yan, Shuicheng (National University of Singapore) | Shen, Jialie (Singapore Management University)
We explore in this paper efficient algorithmic solutions to robustsubspace segmentation. We propose the SSQP, namely SubspaceSegmentation via Quadratic Programming, to partition data drawnfrom multiple subspaces into multiple clusters. The basic idea ofSSQP is to express each datum as the linear combination of otherdata regularized by an overall term targeting zero reconstructioncoefficients over vectors from different subspaces. The derivedcoefficient matrix by solving a quadratic programming problem istaken as an affinity matrix, upon which spectral clustering isapplied to obtain the ultimate segmentation result. Similar tosparse subspace clustering (SCC) and low-rank representation (LRR),SSQP is robust to data noises as validated by experiments on toydata. Experiments on Hopkins 155 database show that SSQP can achievecompetitive accuracy as SCC and LRR in segmenting affine subspaces,while experimental results on the Extended Yale Face Database Bdemonstrate SSQP's superiority over SCC and LRR. Beyond segmentationaccuracy, all experiments show that SSQP is much faster than bothSSC and LRR in the practice of subspace segmentation.
Transfer Learning by Structural Analogy
Wang, Huayan (Stanford University) | Yang, Qiang (Hong Kong University of Science and Technology)
Transfer learning allows knowledge to be extracted from auxiliary domains and be used to enhance learning in a target domain. For transfer learning to be successful, it is critical to find the similarity between auxiliary and target domains, even when such mappings are not obvious. In this paper, we present a novel algorithm for finding the structural similarity between two domains, to enable transfer learning at a structured knowledge level. In particular, we address the problem of how to learn a non-trivial structural similarity mapping between two different domains when they are completely different on the representation level. This problem is challenging because we cannot directly compare features across domains. Our algorithm extracts the structural features within each domain and then maps the features into the Reproducing Kernel Hilbert Space (RKHS), such that the "structural dependencies" of features across domains can be estimated by kernel matrices of the features within each domain. By treating the analogues from both domains as equivalent, we can transfer knowledge to achieve a better understanding of the domains and improved performance for learning. We validate our approach on synthetic and real-world datasets.
Multi-Task Learning in Heterogeneous Feature Spaces
Zhang, Yu (Hong Kong University of Science and Technology) | Yeung, Dit-Yan (Hong Kong University of Science and Technology)
Multi-task learning aims at improving the generalization performance of a learning task with the help of some other related tasks. Although many multi-task learning methods have been proposed, they are all based on the assumption that all tasks share the same data representation. This assumption is too restrictive for general applications. In this paper, we propose a multi-task extension of linear discriminant analysis (LDA), called multi-task discriminant analysis (MTDA), which can deal with learning tasks with different data representations. For each task, MTDA learns a separate transformation which consists of two parts, one specific to the task and one common to all tasks. A by-product of MTDA is that it can alleviate the labeled data deficiency problem of LDA. Moreover, unlike many existing multi-task learning methods, MTDA can handle binary and multi-class problems for each task in a generic way. Experimental results on face recognition show that MTDA consistently outperforms related methods.
Hybrid Planning with Temporally Extended Goals for Sustainable Ocean Observing
Li, Hui (The Boeing Company) | Williams, Brian (Massachusetts Institute of Technology)
A challenge to modeling and monitoring the health of the ocean environment is that it is largely under sensed and difficult to sense remotely. Autonomous underwater vehicles (AUVs) can improve observability, for example of algal bloom regions, ocean acidification, and ocean circulation. This AUV paradigm, however, requires robust operation that is cost effective and responsive to the environment. To achieve low cost we generate operational sequences automatically from science goals, and achieve robustness by reasoning about the discrete and continuous effects of actions. We introduce Kongming2, a generative planner for hybrid systems with temporally extended goals (TEGs) and temporally flexible actions. It takes as input high level goals and outputs trajectories and actions of the hybrid system, for example an AUV. Kongming2 makes two major extensions to Kongming1: planning for TEGs, and planning with temporally flexible actions. We demonstrated a proof of concept of the planner in the Atlantic ocean on Odyssey IV, an AUV designed and built by the MIT AUV Lab at Sea Grant.
Artificial Intelligence for Artificial Artificial Intelligence
Dai, Peng (University of Washington) | Mausam, . (University of Washington) | Weld, Daniel Sabby (University of Washington)
Crowdsourcing platforms such as Amazon Mechanical Turk have become popular for a wide variety of human intelligence tasks; however, quality control continues to be a significant challenge. Recently, we propose TurKontrol, a theoretical model based on POMDPs to optimize iterative, crowd-sourced workflows. However, they neither describe how to learn the model parameters, nor show its effectiveness in a real crowd-sourced setting. Learning is challenging due to the scale of the model and noisy data: there are hundreds of thousands of workers with high-variance abilities. This paper presents an end-to-end system that first learns TurKontrol's POMDP parameters from real Mechanical Turk data, and then applies the model to dynamically optimize live tasks. We validate the model and use it to control a successive-improvement process on Mechanical Turk. By modeling worker accuracy and voting patterns, our system produces significantly superior artifacts compared to those generated through nonadaptive workflows using the same amount of money.
M-Unit EigenAnt: An Ant Algorithm to Find the M Best Solutions
Shah, Sameena (Indian Institute of Technology Delhi) | Jayadeva, Jayadeva (Indian Institute of Technology Delhi) | Kothari, Ravi (IBM India Research Laboratory) | Chandra, Suresh (Indian Institute of Technology Delhi)
In this paper, we shed light on how powerful congestion control based on local interactions may be obtained. We show how ants can use repellent pheromones and incorporate the effect of crowding to avoid traffic congestion on the optimal path. Based on these interactions, we propose an ant algorithm, the M-unit EigenAnt algorithm, that leads to the selection of the M shortest paths. The ratio of selection of each of these paths is also optimal and regulated by an optimal amount of pheromone on each of them. To the best of our knowledge, the M -unit EigenAnt algorithm is the first antalgorithm that explicitly ensures the selection of the M shortest paths and regulates the amount of pheromone on them such that it is asymptotically optimal. In fact, it is in contrast with most ant algorithms that aim to discover just a single best path. We provide its convergence analysis and show that the steady state distribution of pheromone aligns with the eigenvectors of the cost matrix, and thus is related to its measure of quality. We also provide analysis to show that this property ensues even when the food is moved or path lengths change during foraging. We show that this behavior is robust in the presence of fluctuations and quickly reflects the change in the M optimal solutions. This makes it suitable for not only distributed applications butalso dynamic ones as well. Finally, we provide simulation results for the convergence to the optimal solution under different initial biases, dynamism in lengths of paths, and discovery of new paths.
Social Relations Model for Collaborative Filtering
Li, Wu-Jun (Shanghai Jiao Tong University) | Yeung, Dit-Yan (Hong Kong University of Science and Technology)
We propose a novel probabilistic model for collaborative filtering (CF), called SRMCoFi, which seamlessly integrates both linear and bilinear random effects into a principled framework. The formulation of SRMCoFi is supported by both social psychological experiments and statistical theories. Not only can many existing CF methods be seen as special cases of SRMCoFi, but it also integrates their advantages while simultaneously overcoming their disadvantages. The solid theoretical foundation of SRMCoFi is further supported by promising empirical results obtained in extensive experiments using real CF data sets on movie ratings.
Planning with Specialized SAT Solvers
Rintanen, Jussi (The Australian National University)
Logic, and declarative representation of knowledge in general, have long been a preferred framework for problem solving in AI. However, specific subareas of AI have been eager to abandon general-purpose knowledge representation in favor of methods that seem to address their computational core problems better. In planning, for example, state-space search has in the last several years been preferred to logic-based methods such as SAT. In our recent work, we have demonstrated that the observed performance differences between SAT and specialized state-space search methods largely go back to the difference between a blind (or at least planning-agnostic) and a planning-specific search method. If SAT search methods are given even simple heuristics which make the search goal-directed, the efficiency differences disappear.