Education
Deep Manifold Learning of Symmetric Positive Definite Matrices with Application to Face Recognition
Dong, Zhen (Beijing Institute of Technology) | Jia, Su (State University of New York at Stony Brook) | Zhang, Chi (Beijing Institute of Technology) | Pei, Mingtao (Beijing Institute of Technology) | Wu, Yuwei (Beijing Institute of Technology)
In this paper, we aim to construct a deep neural network which embeds high dimensional symmetric positive definite (SPD) matrices into a more discriminative low dimensional SPD manifold. To this end, we develop two types of basic layers: a 2D fully connected layer which reduces the dimensionality of the SPD matrices, and a symmetrically clean layer which achieves non-linear mapping. Specifically, we extend the classical fully connected layer such that it is suitable for SPD matrices, and we further show that SPD matrices with symmetric pair elements setting zero operations are still symmetric positive definite. Finally, we complete the construction of the deep neural network for SPD manifold learning by stacking the two layers. Experiments on several face datasets demonstrate the effectiveness of the proposed method.
Latent Dependency Forest Models
Chu, Shanbo (ShanghaiTech University) | Jiang, Yong (ShanghaiTech University) | Tu, Kewei (ShanghaiTech University)
Probabilistic modeling is one of the foundations of modern Learning the structure of a probabilistic model resembles machine learning and artificial intelligence, which aims to learning the set of production rules of a grammar, while compactly represent the joint probability distribution of random learning model parameters resembles learning grammar rule variables. The most widely used approach for probabilistic probabilities. From the unsupervised grammar learning literature, modeling is probabilistic graphical models. A probabilistic one can see that learning approaches based on PCFGs graphical model represents a probability distribution with a have not been very successful, while the state-of-the-art performance directed or undirected graph. It represents random variables has mostly been achieved based on less expressive with the nodes in the graph and uses the edges in the graph to models such as dependency grammars (DGs) (Klein and encode the probabilistic relationships between random variables.
Distant Supervision for Relation Extraction with Sentence-Level Attention and Entity Descriptions
Ji, Guoliang (National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences) | Liu, Kang (National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences) | He, Shizhu (National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences) | Zhao, Jun (National Laboratory of Pattern Recognition (NLPR), Institute of Automation Chinese Academy of Sciences)
Distant supervision for relation extraction is an efficient method to scale relation extraction to very large corpora which contains thousands of relations. However, the existing approaches have flaws on selecting valid instances and lack of background knowledge about the entities. In this paper, we propose a sentence-level attention model to select the valid instances, which makes full use of the supervision information from knowledge bases. And we extract entity descriptions from Freebase and Wikipedia pages to supplement background knowledge for our task. The background knowledge not only provides more information for predicting relations, but also brings better entity representations for the attention module. We conduct three experiments on a widely used dataset and the experimental results show that our approach outperforms all the baseline systems significantly.
Parametric Dual Maximization for Non-Convex Learning Problems
Zhou, Yuxun (Unviersity of California, Berkeley) | Kang, Zhaoyi (Unviersity of California, Berkeley) | Spanos, Costas J. (Unviersity of California, Berkeley)
We consider a class of non-convex learning problems that can be formulated as jointly optimizing regularized hinge loss and a set of auxiliary variables. Such problems encompass but are not limited to various versions of semi-supervised learning,learning with hidden structures, robust learning, etc. Existing methods either suffer from local minima or have to invoke anon-scalable combinatorial search. In this paper, we propose a novel learning procedure, namely Parametric Dual Maximization(PDM), that can approach global optimality efficiently with user specified approximation levels. The building blocks of PDM are two new results: (1) The equivalent convex maximization reformulation derived by parametric analysis.(2) The improvement of local solutions based on a necessary and sufficient condition for global optimality. Experimental results on two representative applications demonstrate the effectiveness of PDM compared to other approaches.
Learning Sparse Task Relations in Multi-Task Learning
Zhang, Yu (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology)
In multi-task learning, when the number of tasks is large, pairwise task relations exhibit sparse patterns since usually a task cannot be helpful to all of the other tasks and moreover, sparse task relations can reduce the risk of overfitting compared with the dense ones. In this paper, we focus on learning sparse task relations. Based on a regularization framework which can learn task relations among multiple tasks, we propose a SParse covAriance based mulTi-taSk (SPATS) model to learn a sparse covariance by using the ℓ l regularization. The resulting objective function of the SPATS method is convex, which allows us to devise an alternating method to solve it. Moreover, some theoretical properties of the proposed model are studied. Experiments on synthetic and real-world datasets demonstrate the effectiveness of the proposed method.
A Framework of Online Learning with Imbalanced Streaming Data
Yan, Yan (University of Technology Sydney) | Yang, Tianbao (The University of Iowa) | Yang, Yi (University of Technology Sydney) | Chen, Jianhui (Yahoo! Labs)
A challenge for mining large-scale streaming data overlooked by most existing studies on online learning is the skew-distribution of examples over different classes. Many previous works have considered cost-sensitive approaches in an online setting for streaming data, where fixed costs are assigned to different classes, or ad-hoc costs are adapted based on the distribution of data received so far. However, it is not necessary for them to achieve optimal performance in terms of the measures suited for imbalanced data, such as F-measure, area under ROC curve (AUROC), area under precision and recall curve (AUPRC). This work proposes a general framework for online learning with imbalanced streaming data, where examples are coming sequentially and models are updated accordingly on-the-fly. By simultaneously learning multiple classifiers with different cost vectors, the proposed method can be adopted for different target measures for imbalanced data, including F-measure, AUROC and AUPRC. Moreover, we present a rigorous theoretical justification of the proposed framework for the F-measure maximization. Our empirical studies demonstrate the competitive if not better performance of the proposed method compared to previous cost-sensitive and resampling based online learning algorithms and those that are designed for optimizing certain measures.
Distant Domain Transfer Learning
Tan, Ben (Hong Kong University of Science and Technology) | Zhang, Yu (Hong Kong University of Science and Technology) | Pan, Sinno Jialin (Nanyang Technological University) | Yang, Qiang (Hong Kong University of Science and Technology)
In this paper, we study a novel transfer learning problem termed Distant Domain Transfer Learning (DDTL). Different from existing transfer learning problems which assume that there is a close relation between the source domain and the target domain, in the DDTL problem, the target domain can be totally different from the source domain. For example, the source domain classifies face images but the target domain distinguishes plane images. Inspired by the cognitive processof human where two seemingly unrelated concepts can be connected by learning intermediate concepts gradually, we propose a Selective Learning Algorithm (SLA) to solve the DDTL problem with supervised autoencoder or supervised convolutional autoencoder as a base model for handling different types of inputs. Intuitively, the SLA algorithm selects usefully unlabeled data gradually from intermediate domains as a bridge to break the large distribution gap for transferring knowledge between two distant domains. Empirical studies on image classification problems demonstrate the effectiveness of the proposed algorithm, and on some tasks the improvement in terms of the classification accuracy is up to 17% over “non-transfer” methods.
Where to Add Actions in Human-in-the-Loop Reinforcement Learning
Mandel, Travis (University of Washington) | Liu, Yun-En (Enlearn) | Brunskill, Emma (Carnegie Mellon University) | Popović, Zoran (University of Washington)
In order for reinforcement learning systems to learn quickly in vast action spaces such as the space of all possible pieces of text or the space of all images, leveraging human intuition and creativity is key. However, a human-designed action space is likely to be initially imperfect and limited; furthermore, humans may improve at creating useful actions with practice or new information. Therefore, we propose a framework in which a human adds actions to a reinforcement learning system over time to boost performance. In this setting, however, it is key that we use human effort as efficiently as possible, and one significant danger is that humans waste effort adding actions at places (states) that aren't very important. Therefore, we propose Expected Local Improvement (ELI), an automated method which selects states at which to query humans for a new action. We evaluate ELI on a variety of simulated domains adapted from the literature, including domains with over a million actions and domains where the simulated experts change over time. We find ELI demonstrates excellent empirical performance, even in settings where the synthetic "experts" are quite poor.
Efficient Online Model Adaptation by Incremental Simplex Tableau
Lei, Zhixian (Harward University) | Ye, Xuehan (Renmin University of China) | Wang, Yongcai (Renmin University of China) | Li, Deying (Renmin University of China) | Xu, Jia (Hunter College, City University of New York)
Online multi-kernel learning is promising in the era of mobile computing, in which a combined classifier with multiple kernels are offline trained, and online adapts to personalized features for serving the end user precisely and smartly. The online adaptation is mainly carried out at the end-devices, which requires the adaptation algorithms to be light, efficient and accurate. Previous results focused mainly on efficiency. This paper proposes an novel online model adaptation framework for not only efficiency but also optimal online adaptation. At first, an online optimal incremental simplex tableau (IST)algorithm is proposed, which approaches the model adaption by linear programming and produces the optimized model update in each step when a personalized training data is collected.But keeping online optimal in each step is expensive and may cause over-fitting especially when the online data is noisy. A Fast-IST approach is therefore proposed, which measures the deviation between the training data and the current model. It schedules updating only when enough deviation is detected. The efficiency of each update is further enhanced by running IST only limited iterations, which bounds the computation complexity. Theoretical analysis and extensive evaluations show that Fast-IST saves computation cost greatly, while achieving speedy and accurate model adaptation.It provides better model adaptation speed and accuracy while using even lower computing cost than the state-of-the art.
Learning Unitary Operators with Help From u(n)
Hyland, Stephanie L. (ETH Zurich ) | Rätsch, Gunnar (ETH Zurich)
A major challenge in the training of recurrent neural networks is the so-called vanishing or exploding gradient problem. The use of a norm-preserving transition operator can address this issue, but parametrization is challenging. In this work we focus on unitary operators and describe a parametrization using the Lie algebra u( n ) associated with the Lie group U ( n ) of n × n unitary matrices. The exponential map provides a correspondence between these spaces, and allows us to define a unitary matrix using n 2 real coefficients relative to a basis of the Lie algebra. The parametrization is closed under additive updates of these coefficients, and thus provides a simple space in which to do gradient descent. We demonstrate the effectiveness of this parametrization on the problem of learning arbitrary unitary operators, comparing to several baselines and outperforming a recently-proposed lower-dimensional parametrization. We additionally use our parametrization to generalize a recently-proposed unitary recurrent neural network to arbitrary unitary matrices, using it to solve standard long-memory tasks.