University of Science and Technology of China
Agent Behavior Prediction and Its Generalization Analysis
Tian, Fei (University of Science and Technology of China) | Li, Haifang (Chinese Academy of Sciences) | Chen, Wei (Microsoft Research) | Qin, Tao (Microsoft Research) | Chen, Enhong (University of Science and Technology of China) | Liu, Tie-Yan (Microsoft Research)
Machine learning algorithms have been applied to predict agent behaviors in real-world dynamic systems, such as advertiser behaviors in sponsored search and worker behaviors in crowdsourcing. Behavior data in these systems are generated by live agents: once systems change due to adoption of prediction models learnt from behavior data, agents will observe and respond to these changes by changing their own behaviors accordingly. Therefore, the evolving behavior data will not be identically and independently distributed, posing great challenges to theoretical analysis. To tackle this challenge, in this paper, we propose to use Markov Chain in Random Environments (MCRE) to describe the behavior data, and perform generalization analysis of machine learning algorithms on its basis. We propose a novel technique that transforms the original time-variant MCRE into a higher-dimensional time-homogeneous Markov chain, which is easier to deal with. We prove the convergence of the new Markov chain when time approaches infinity. Then we obtain a generalization bound for the machine learning algorithms on the behavior data generated by the new Markov chain. To the best of our knowledge, this is the first work that performs the generalization analysis on data generated by complex processes in real-world dynamic systems.
Thompson Sampling Based Monte-Carlo Planning in POMDPs
Bai, Aijun (University of Science and Technology of China) | Wu, Feng (University of Southampton) | Zhang, Zongzhang (National University of Singapore) | Chen, Xiaoping (University of Science and Technology of China)
Monte-Carlo tree search (MCTS) has been drawing great interest in recent years for planning under uncertainty. One of the key challenges is the trade-off between exploration and exploitation. To address this, we introduce a novel online planning algorithm for large POMDPs using Thompson sampling based MCTS that balances between cumulative and simple regrets. The proposed algorithm Dirichlet-Dirichlet-NormalGamma based Partially Observable Monte-Carlo Planning (D 2 NG-POMCP) treats the accumulated reward of performing an action from a belief state in the MCTS search tree as a random variable following an unknown distribution with hidden parameters. Bayesian method is used to model and infer the posterior distribution of these parameters by choosing the conjugate prior in the form of a combination of two Dirichlet and one NormalGamma distributions. Thompson sampling is exploited to guide the action selection in the search tree. Experimental results confirmed that our algorithm outperforms the state-of-the-art approaches on several common benchmark problems.
The Automated Acquisition of Suggestions from Tweets
Dong, Li (Beihang University) | Wei, Furu (Microsoft Research Asia) | Duan, Yajuan (University of Science and Technology of China) | Liu, Xiaohua (Microsoft Research Asia) | Zhou, Ming (Microsoft Research Asia) | Xu, Ke (Beihang University)
This paper targets at automatically detecting and classifying user's suggestions from tweets. The short and informal nature of tweets, along with the imbalanced characteristics of suggestion tweets, makes the task extremely challenging. To this end, we develop a classification framework on Factorization Machines, which is effective and efficient especially in classification tasks with feature sparsity settings. Moreover, we tackle the imbalance problem by introducing cost-sensitive learning techniques in Factorization Machines. Extensively experimental studies on a manually annotated real-life data set show that the proposed approach significantly improves the baseline approach, and yields the precision of 71.06% and recall of 67.86%. We also investigate the reason why Factorization Machines perform better. Finally, we introduce the first manually annotated dataset for suggestion classification.
Goal-Oriented Euclidean Heuristics with Manifold Learning
Chen, Wenlin (Washington University in St. Louis) | Chen, Yixin (Washington University in St. Louis) | Weinberger, Kilian (Washington University in St. Louis) | Lu, Qiang (University of Science and Technology of China) | Chen, Xiaoping (University of Science and Technology of China)
Recently, a Euclidean heuristic (EH) has been proposed for A* search. EH exploits manifold learning methods to construct an embedding of the state space graph, and derives an admissible heuristic distance between two states from the Euclidean distance between their respective embedded points. EH has shown good performance and memory efficiency in comparison to other existing heuristics such as differential heuristics. However, its potential has not been fully explored. In this paper, we propose a number of techniques that can significantly improve the quality of EH. We propose a goal-oriented manifold learning scheme that optimizes the Euclidean distance to goals in the embedding while maintaining admissibility and consistency. We also propose a state heuristic enhancement technique to reduce the gap between heuristic and true distances. The enhanced heuristic is admissible but no longer consistent. We then employ a modified search algorithm, known as B' algorithm, that achieves optimality with inconsistent heuristics using consistency check and propagation. We demonstrate the effectiveness of the above techniques and report un-matched reduction in search costs across several non-trivial benchmark search problems.
Utilizing Landmarks in Euclidean Heuristics for Optimal Planning
Lu, Qiang (University of Science and Technology of China) | Chen, Wenlin (Washington University in St. Louis) | Chen, Yixin (Washington University in St. Louis) | Weinberger, Kilian Q. (Washington University in St. Louis) | Chen, Xiaoping (University of Science and Technology of China)
An important problem in AI is to construct high-quality heuristics for optimal search. Recently, the Euclidean heuristic (EH) has been proposed, which embeds a state space graph into a Euclidean space and uses Euclidean distances as approximations for the graph distances. The embedding process leverages recent research results from manifold learning, a subfield in machine learning, and guarantees that the heuristic is provably admissible and consistent. EH has shown good performance and memory efficiency in comparison to other existing heuristics. Our recent works have further improved the scalability and quality of EH. In this short paper, we present our latest progress on applying EH to problems in planning formalisms, which provide richer semantics than the simple state-space graph model. In particular, we improve EH by exploiting the landmark structure derived from the SAS+ planning formalism.
Covering Number as a Complexity Measure for POMDP Planning and Learning
Zhang, Zongzhang (University of Science and Technology of China) | Littman, Michael (Rutgers University) | Chen, Xiaoping (University of Science and Technology of China)
Finding a meaningful way of characterizing the difficulty of partially observable Markov decision processes (POMDPs) is a core theoretical problem in POMDP research. State-space size is often used as a proxy for POMDP difficulty, but it is a weak metric at best. Existing work has shown that the covering number for the reachable belief space, which is a set of belief points that are reachable from the initial belief point, has interesting links with the complexity of POMDP planning, theoretically. In this paper, we present empirical evidence that the covering number for the reachable belief space (or just ``covering number", for brevity) is a far better complexity measure than the state-space size for both planning and learning POMDPs on several small-scale benchmark problems. We connect the covering number to the complexity of learning POMDPs by proposing a provably convergent learning algorithm for POMDPs without reset given knowledge of the covering number.
Towards Maximizing the Area Under the ROC Curve for Multi-Class Classification Problems
Tang, Ke (University of Science and Technology of China) | Wang, Rui (University of Science and Technology of China) | Chen, Tianshi (Chinese Academy of Sciences)
The Area Under the ROC Curve (AUC) metric has achieved a big success in binary classification problems since they measure the performance of classifiers without making any specific assumptions about the class distribution and misclassification costs. This is desirable because the class distribution and misclassification costs may be unknown during training process or even change in environment. MAUC, the extension of AUC to multi-class problems, has also attracted a lot of attention. However, despite the emergence of approaches for training classifiers with large AUC, little has been done for MAUC. This paper analyzes MAUC in-depth, and reveals that the maximization of MAUC can be achieved by decomposing the multi-class problem into a number of independent sub-problems. These sub-problems are formulated in the form of a “learning to rank” problem, for which well-established methods already exist. Based on the analysis, a method that employs RankBoost algorithm as the sub-problem solver is proposed to achieve classification systems with maximum MAUC. Empirical studies have shown the advantages of the proposed method over other eight relevant methods. Due to the importance of MAUC to multi-class cost-sensitive learning and class imbalanced learning problems, the proposed method is a general technique for both problems. It can also be generalized to accommodate other learning algorithms as the sub-problem solvers.
Trial-Based Dynamic Programming for Multi-Agent Planning
Wu, Feng (University of Science and Technology of China) | Zilberstein, Shlomo (University of Massachusetts Amherst) | Chen, Xiaoping (University of Science and Technology of China)
Trial-based approaches offer an efficient way to solve single-agent MDPs and POMDPs. These approaches allow agents to focus their computations on regions of the environment they encounter during the trials, leading to significant computational savings. We present a novel trial-based dynamic programming (TBDP) algorithm for DEC-POMDPs that extends these benefits to multi-agent settings. The algorithm uses trial-based methods for both belief generation and policy evaluation. Policy improvement is implemented efficiently using linear programming and a sub-policy reuse technique that helps bound the amount of memory. The results show that TBDP can produce significant value improvements and is much faster than the best existing planning algorithms.