Education
Tag-Weighted Topic Model For Large-scale Semi-Structured Documents
Li, Shuangyin, Li, Jiefei, Huang, Guan, Tan, Ruiyang, Pan, Rong
To date, there have been massive Semi-Structured Documents (SSDs) during the evolution of the Internet. These SSDs contain both unstructured features (e.g., plain text) and metadata (e.g., tags). Most previous works focused on modeling the unstructured text, and recently, some other methods have been proposed to model the unstructured text with specific tags. To build a general model for SSDs remains an important problem in terms of both model fitness and efficiency. We propose a novel method to model the SSDs by a so-called Tag-Weighted Topic Model (TWTM). TWTM is a framework that leverages both the tags and words information, not only to learn the document-topic and topic-word distributions, but also to infer the tag-topic distributions for text mining tasks. We present an efficient variational inference method with an EM algorithm for estimating the model parameters. Meanwhile, we propose three large-scale solutions for our model under the MapReduce distributed computing platform for modeling large-scale SSDs. The experimental results show the effectiveness, efficiency and the robustness by comparing our model with the state-of-the-art methods in document modeling, tags prediction and text classification. We also show the performance of the three distributed solutions in terms of time and accuracy on document modeling.
Context-aware learning for finite mixture models
Perdikis, Serafeim, Leeb, Robert, Chavarriaga, Ricardo, Millán, José del R.
This work introduces algorithms able to exploit contextual information in order to improve maximum-likelihood (ML) parameter estimation in finite mixture models (FMM), demonstrating their benefits and properties in several scenarios. The proposed algorithms are derived in a probabilistic framework with regard to situations where the regular FMM graphs can be extended with context-related variables, respecting the standard expectation-maximization (EM) methodology and, thus, rendering explicit supervision completely redundant. We show that, by direct application of the missing information principle, the compared algorithms' learning behaviour operates between the extremities of supervised and unsupervised learning, proportionally to the information content of contextual assistance. Our simulation results demonstrate the superiority of context-aware FMM training as compared to conventional unsupervised training in terms of estimation precision, standard errors, convergence rates and classification accuracy or regression fitness in various scenarios, while also highlighting important differences among the outlined situations. Finally, the improved classification outcome of contextually enhanced FMMs is showcased in a brain-computer interface application scenario.
Reduced-Set Kernel Principal Components Analysis for Improving the Training and Execution Speed of Kernel Machines
Kingravi, Hassan A., Vela, Patricio A., Gray, Alexandar
This paper presents a practical, and theoretically well-founded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-to-implement method to remove or replace samples with minimal effect on the empirical operator. A simple data point selection procedure is given to generate a substitute density for the data, with accuracy that is governed by a user-tunable parameter . The effect of the approximation on the quality of the KPCA solution, in terms of spectral and operator errors, can be shown directly in terms of the density estimate error and as a function of the parameter . We show in experiments that RSKPCA can improve both training and evaluation time of KPCA by up to an order of magnitude, and compares favorably to the widely-used Nystrom and density-weighted Nystrom methods.
Optimal Learning Rates for Localized SVMs
One of the limiting factors of using support vector machines (SVMs) in large scale applications are their super-linear computational requirements in terms of the number of training samples. To address this issue, several approaches that train SVMs on many small chunks of large data sets separately have been proposed in the literature. So far, however, almost all these approaches have only been empirically investigated. In addition, their motivation was always based on computational requirements. In this work, we consider a localized SVM approach based upon a partition of the input space. For this local SVM, we derive a general oracle inequality. Then we apply this oracle inequality to least squares regression using Gaussian kernels and deduce local learning rates that are essentially minimax optimal under some standard smoothness assumptions on the regression function. This gives the first motivation for using local SVMs that is not based on computational requirements but on theoretical predictions on the generalization performance. We further introduce a data-dependent parameter selection method for our local SVM approach and show that this method achieves the same learning rates as before. Finally, we present some larger scale experiments for our localized SVM showing that it achieves essentially the same test performance as a global SVM for a fraction of the computational requirements. In addition, it turns out that the computational requirements for the local SVMs are similar to those of a vanilla random chunk approach, while the achieved test errors are significantly better.
PCA with Gaussian perturbations
Kotłowski, Wojciech, Warmuth, Manfred K.
Most of machine learning deals with vector parameters. Ideally we would like to take higher order information into account and make use of matrix or even tensor parameters. However the resulting algorithms are usually inefficient. Here we address on-line learning with matrix parameters. It is often easy to obtain online algorithm with good generalization performance if you eigendecompose the current parameter matrix in each trial (at a cost of $O(n^3)$ per trial). Ideally we want to avoid the decompositions and spend $O(n^2)$ per trial, i.e. linear time in the size of the matrix data. There is a core trade-off between the running time and the generalization performance, here measured by the regret of the on-line algorithm (total gain of the best off-line predictor minus the total gain of the on-line algorithm). We focus on the key matrix problem of rank $k$ Principal Component Analysis in $\mathbb{R}^n$ where $k \ll n$. There are $O(n^3)$ algorithms that achieve the optimum regret but require eigendecompositions. We develop a simple algorithm that needs $O(kn^2)$ per trial whose regret is off by a small factor of $O(n^{1/4})$. The algorithm is based on the Follow the Perturbed Leader paradigm. It replaces full eigendecompositions at each trial by the problem finding $k$ principal components of the current covariance matrix that is perturbed by Gaussian noise.
Evaluation of Spectral Learning for the Identification of Hidden Markov Models
Mattila, Robert, Rojas, Cristian R., Wahlberg, Bo
Hidden Markov models have successfully been applied as models of discrete time series in many fields. Often, when applied in practice, the parameters of these models have to be estimated. The currently predominating identification methods, such as maximum-likelihood estimation and especially expectation-maximization, are iterative and prone to have problems with local minima. A non-iterative method employing a spectral subspace-like approach has recently been proposed in the machine learning literature. This paper evaluates the performance of this algorithm, and compares it to the performance of the expectation-maximization algorithm, on a number of numerical examples. We find that the performance is mixed; it successfully identifies some systems with relatively few available observations, but fails completely for some systems even when a large amount of observations is available. An open question is how this discrepancy can be explained. We provide some indications that it could be related to how well-conditioned some system parameters are.
Plan Recognition for Exploratory Learning Environments Using Interleaved Temporal Search
Uzan, Oriel (Ben-Gurion University) | Dekel, Reuth (Ben-Gurion University) | Seri, Or (Ben-Gurion University) | Gal, Ya’akov (Kobi) (Ben-Gurion University.)
This article presents techniques for recognizing students activities in ELEs and visualizing these activities to students. It describes a new plan recognition algorithm that takes into account repetition and interleaving of activities. It was able to outperform the state-of-the-art plan recognition algorithms when compared to a gold-standard that was obtained by a domain-expert. We also show that visualizing students' plans improves their performance on new problems when compared to an alternative visualization that consists of a step-by-step list of actions.
Report on the Twenty-Second International Conference on Case-Based Reasoning
Bridge, Derek (University College Cork) | Lamontagne, Luc (Université Laval) | Plaza, Enric (IIIA, Artificial Intelligence Research Institute CSIC, Spanish National Research Council)
ICCBR is the annual meeting of the CBR community and the leading conference on this topic. Started in 1993 as the European Conference on CBR and 1995 as ICCBR, the two conferences alternated biennially until their merger in 2010. The main conference track featured 19 research paper presentations, 16 posters, and two invited speakers. The papers and posters reflected the state of the art of case-based reasoning, dealing both with open problems at the core of casebased reasoning (especially in similarity assessment, case adaptation, and case-based maintenance), as well as trending applications of CBR. Minor, Goethe University, Germany, and Emmanuel The first invited speaker, Tony Veale from University Nauer, LORIA, France.
Architectures for Activity Recognition and Context-Aware Computing
Geib, Christopher (Drexel University) | Agrawal, Vikas (Infosys Limited) | Sukthankar, Gita (University of Central Florida) | Shastri, Lokendra (Infosys Limited) | Bui, Hung (Nuance Communications)
The last 10 years have seen the development of novel architectures and technologies for domainfocused, task-specific systems that know many things, such as who (identities, profile, history) they are with (social context) and in what role (responsibility, security, privacy); when and where (event, time, place); why (goals, shared or personal); how are they doing it (methods, applications); and using what resources (device, services, access, and ownership). Smart spaces and devices will increasingly use such contextual knowledge to help users move seamlessly between devices and applications, without having to explicitly carry, transfer, and exchange activity context. Such systems will qualitatively shift our lives both at work and play and significantly change our interactions both with our physical and virtual worlds. This dream of seamlessly interacting with our virtual environment has a long history as can be seen in Apple Inc.'s Knowledge Navigator 1987 concept video. However, the combination of dramatic progress in low-power mobile computing devices and sensors, with advances in artificial intelligence and human-computer interaction (HCI) in the last decade, have provided the kind of platforms and algorithms that are enabling context-aware virtual personal assistants that plan activities and recognize intent. This has lead to an increase in work designed to bring these ideas into real world application and address the final technical hurdles that will make such systems a reality.