Statistical Learning
Unsupervised Lexicon Acquisition for HPSG-Based Relation Extraction
Rozenfeld, Benjamin (Digital Trowel) | Feldman, Ronen (Hebrew University of Jerusalem)
The paper describes a method of relation extraction, which is based on parsing the input text using a combination of a generic HPSG-based grammar and a highly focused domain- and relation-specific lexicon. We also show a method of unsupervised acquisition of such a lexicon from a large unlabeled corpus. Together, the methods introduce a novel approach to the โOpen IEโ task, which is superior in accuracy and in quality of relation identification to the existing approaches.
Unsupervised Lexicon Acquisition for HPSG-Based Relation Extraction
Rozenfeld, Benjamin (Digital Trowel) | Feldman, Ronen (Hebrew University of Jerusalem)
The paper describes a method of relation extraction, which is based on parsing the input text using a combination of a generic HPSG-based grammar and a highly focused domain- and relation-specific lexicon. We also show a method of unsupervised acquisition of such a lexicon from a large unlabeled corpus. Together, the methods introduce a novel approach to the โOpen IEโ task, which is superior in accuracy and in quality of relation identification to the existing approaches.
Unsupervised Lexicon Acquisition for HPSG-Based Relation Extraction
Rozenfeld, Benjamin (Digital Trowel) | Feldman, Ronen (Hebrew University of Jerusalem)
The paper describes a method of relation extraction, which is based on parsing the input text using a combination of a generic HPSG-based grammar and a highly focused domain- and relation-specific lexicon. We also show a method of unsupervised acquisition of such a lexicon from a large unlabeled corpus. Together, the methods introduce a novel approach to the โOpen IEโ task, which is superior in accuracy and in quality of relation identification to the existing approaches.
Input Parameter Calibration in Forest Fire Spread Prediction: Taking the Intelligent Way
Wendt, Kerstin (Universitat Autònoma de Barcelona) | Cortรฉs, Ana (Universitat Autònoma de Barcelona)
Imprecision and uncertainty in the large number of input parameters are serious problems in forest fire behaviour modelling. To obtain more reliable forecasts, fast and efficient computational input parameter estimation and calibration mechanisms should be integrated. These have to respect hard real-time constraints of simulations to prevent tragedy. We propose an Evolutionary Intelligent System (EIS) for parameter calibration. Depending on disaster size, required parameter precision, and available computing resources, the hybridisation of an evolutionary algorithm (EA) with an intelligent paradigm (IP) can be configured. Experiments show that EIS generates comparable estimations to standard evolutionary calibration approaches, clearly outperforming the latter in runtime.
Combining Spatial and Temporal Aspects of Prediction Problems to Improve Prediction Performance
Groves, William (University of Minnesota)
Quantitative prediction problems involving both spatial and temporal components have appeared prominently in several disparate research areas including finance, supply chain management, and civil engineering. Unfortunately, either the spatial or temporal aspect tends to dominate the other in many prediction formulations. We briefly examine the underlying formulations used in spatial and temporal prediction. Then, we outline a method that combines these approaches and improves prediction results in high-dimensional economic domains by integrating multivariate and time series techniques which require minimal tuning but achieve superior performance compared to previous methods. We present preliminary results in the context of the Trading Agent Competition for Supply Chain Management.
Large Linear Classification When Data Cannot Fit in Memory
Yu, Hsiang-Fu (National Taiwan University) | Hsieh, Cho-Jui (National Taiwan University) | Chang, Kai-Wei (National Taiwan University) | Lin, Chih-Jen (National Taiwan University)
Linear classification is a useful tool for dealing with large-scale data in applications such as document classification and natural language processing. Recent developments of linear classification have shown that the training process can be efficiently conducted. However, when the data size exceeds the memory capacity, most training methods suffer from very slow convergence due to the severe disk swapping. Although some methods have attempted to handle such a situation, they are usually too complicated to support some important functions such as parameter selection. In this paper, we introduce a block minimization framework for data larger than memory. Under the framework, a solver splits data into blocks and stores them into separate files. Then, at each time, the solver trains a data block loaded from disk. Although the framework is simple, the experimental results show that it effectively handles a data set 20 times larger than the memory capacity.
Wsabie: Scaling Up to Large Vocabulary Image Annotation
Weston, Jason (Google Research) | Bengio, Samy (Google Research) | Usunier, Nicolas (Universite de Paris 6)
Weighted Pairwise Classification (OWPC) loss [Usunier et al., 2009] which has been shown to be state-of-the-art on Image annotation datasets are becoming larger and (small) text retrieval tasks. WARP uses stochastic gradient larger, with tens of millions of images and tens descent and a novel sampling trick to approximate ranks resulting of thousands of possible annotations. We propose in an efficient online optimization strategy which we a strongly performing method that scales to show is superior to standard stochastic gradient descent applied such datasets by simultaneously learning to optimize to the same loss, enabling us to train on datasets that precision at the top of the ranked list of annotations do not even fit in memory.
Learning Linear and Kernel Predictors with the 0-1 Loss Function
Shalev-Shwartz, Shai (The Hebrew University) | Shamir, Ohad (The Hebrew University and Microsoft Research) | Sridharan, Karthik (Toyota Technological Institute)
Some of the most successful machine learning algorithms, such as Support Vector Machines, are based on learning linear and kernel predictors with respect to a convex loss function, such as the hinge loss. For classification purposes, a more natural loss function is the 0-1 loss. However, using it leads to a non-convex problem for which there is no known efficient algorithm. In this paper, we describe and analyze a new algorithm for learning linear or kernel predictors with respect to the 0-1 loss function. The algorithm is parameterized by $L$, which quantifies the effective width around the decision boundary in which the predictor may be uncertain. We show that without any distributional assumptions, and for any fixed $L$, the algorithm runs in polynomial time, and learns a classifier which is worse than the optimal such classifier by at most $\epsilon$. We also prove a hardness result, showing that under a certain cryptographic assumption, no algorithm can learn such classifiers in time polynomial in $L$.
Mind the Eigen-Gap, or How to Accelerate Semi-Supervised Spectral Learning Algorithms
Mavroeidis, Dimitrios (Radboud University Nijmegen)
Semi-supervised learning algorithms commonly incorporate the available background knowledge such that an expression of the derived model's quality is improved. Depending on the specific context quality can take several forms and can be related to the generalization performance or to a simple clustering coherence measure. Recently, a novel perspective of semi-supervised learning has been put forward, that associates semi-supervised clustering with the efficiency of spectral methods. More precisely, it has been demonstrated that the appropriate use of partial supervision can bias the data Laplacian matrix such that the necessary eigenvector computations are provably accelerated. This result allows data mining practitioners to use background knowledge not only for improving the quality of clustering results, but also for accelerating the required computations. In this paper we initially provide a high level overview of the relevant efficiency maximizing semi-supervised methods such that their theoretical intuitions are comprehensively outlined. Consecutively, we demonstrate how these methods can be extended to handle multiple clusters and also discuss possible issues that may arise in the continuous semi-supervised solution. Finally, we illustrate the proposed extensions empirically in the context of text clustering.
A Flat Histogram Method for Computing the Density of States of Combinatorial Problems
Ermon, Stefano (Cornell University) | Gomes, Carla (Cornell University) | Selman, Bart (Cornell University)
Consider a combinatorial state space S, such as the set of all truth assignments to N Boolean variables. Given a partition of S, we consider the problem of estimating the size of all the subsets in which S is divided. This problem, also known as computing the density of states, is quite general and has many applications. For instance, if we consider a Boolean formula in CNF and we partition according to the number of violated constraints, computing the density of states is a generalization of both SAT, MAXSAT and model counting. We propose a novel Markov Chain Monte Carlo algorithm to compute the density of states of Boolean formulas that is based on a flat histogram approach. Our method represents a new approach to a variety of inference, learning, and counting problems. We demonstrate its practical effectiveness by showing that the method converges quickly to an accurate solution on a range of synthetic and real-world instances.