The University of Texas at Dallas
Chinese Relation Classification via Convolutional Neural Networks
Zhang, Linrui (The University of Texas at Dallas) | Moldovan, Dan (The University of Texas at Dallas)
Relation classification is an important task in natural language processing. Traditional relation classification techniques suffer from extensive use of linguistic features and external toolkits. In recent years, deep learning models that can automatically learn features from text are playing a more essential role in this area. In this paper we present a novel convolutional neural network (CNN) approach along shortest dependency paths (SDP) for Chinese relation classification. We first propose a baseline end-to-end model that only takes sentence-level features, and then improve its performance by joint use of pre-extracted linguistic features. The performance of the system is evaluated on the ACE 2005 Multilingual Training Corpus Chinese dataset. The baseline model achieved a 74.93% F-score on six general type relations and a 66.29% F-score on eighteen subtype relations, and the performance improved 10.71% and 13.60% respectively by incorporating linguistic features into the baseline system.
Solving Generalized Column Subset Selection With Heuristic Search
Shah, Swair (The University of Texas at Dallas) | He, Baokun (The University of Texas at Dallas) | Xu, Ke (The University of Texas at Dallas) | Maung, Crystal (The University of Texas at Dallas) | Schweitzer, Haim (The University of Texas at Dallas)
We address the problem of approximating a matrix by the linear combination of a column sparse matrix and a low rank matrix. Two variants of a heuristic search algorithm are described. The first produces an optimal solution but may be slow, as these problems are believed to be NP-hard. The second is much faster, but only guarantees a suboptimal solution. The quality of the approximation and the optimality criterion can be specified in terms of unitarily invariant norms.
Automatic Parameter Tying: A New Approach for Regularized Parameter Learning in Markov Networks
Chou, Li (The University of Texas at Dallas) | Sahoo, Pracheta (The University of Texas at Dallas) | Sarkhel, Somdeb (Adobe Research) | Ruozzi, Nicholas (The University of Texas at Dallas) | Gogate, Vibhav (The University of Texas at Dallas)
Parameter tying is a regularization method in which parameters (weights) of a machine learning model are partitioned into groups by leveraging prior knowledge and all parameters in each group are constrained to take the same value. In this paper, we consider the problem of parameter learning in Markov networks and propose a novel approach called automatic parameter tying (APT) that uses automatic instead of a priori and soft instead of hard parameter tying as a regularization method to alleviate overfitting. The key idea behind APT is to set up the learning problem as the task of finding parameters and groupings of parameters such that the likelihood plus a regularization term is maximized. The regularization term penalizes models where parameter values deviate from their group mean parameter value. We propose and use a block coordinate ascent algorithm to solve the optimization task. We analyze the sample complexity of our new learning algorithm and show that it yields optimal parameters with high probability when the groups are well separated. Experimentally, we show that our method improves upon L 2 regularization and suggest several pragmatic techniques for good practical performance.
Cleaning the Null Space: A Privacy Mechanism for Predictors
Xu, Ke (The University of Texas at Dallas) | Cao, Tongyi ( University of Massachusetts Amherst ) | Shah, Swair (The University of Texas at Dallas) | Maung, Crystal (The University of Texas at Dallas) | Schweitzer, Haim (The University of Texas at Dallas)
In standard machine learning and regression setting feature values are used to predict some desired information. The privacy challenge considered here is to prevent an adversary from using available feature values to predict confidential information that one wishes to keep secret. We show that this can sometimes be achieved with almost no effect on the qual- ity of predicting desired information. We describe two algorithms aimed at providing such privacy when the predictors have a linear operator in the first stage. The desired effect can be achieved by zeroing out feature components in the approximate null space of the linear operator.
Enhancing the Privacy of Predictors
Xu, Ke (The University of Texas at Dallas) | Shah, Swair (The University of Texas at Dallas) | Cao, Tongyi (University of Massachusetts Amherst) | Maung, Crystal (The University of Texas at Dallas) | Schweitzer, Haim (The University of Texas at Dallas)
The privacy challenge considered here is to prevent an adversary from using available feature values to predict confi- dential information. We propose an algorithm providing such privacy for predictors that have a linear operator in the first stage. Privacy is achieved by zeroing out feature components in the approximate null space of the linear operator. We show that this has little effect on predicting desired information.
SAND: Semi-Supervised Adaptive Novel Class Detection and Classification over Data Stream
Haque, Ahsanul (The University of Texas at Dallas) | Khan, Latifur (The University of Texas at Dallas) | Baron, Michael (The University of Texas at Dallas)
Most approaches to classifying data streams either divide the stream into fixed-size chunks or use gradual forgetting. Due to evolving nature of data streams, finding a proper size or choosing a forgetting rate without prior knowledge about time-scale of change is not a trivial task. These approaches hence suffer from a trade-off between performance and sensitivity. Existing dynamic sliding window based approaches address this problem by tracking changes in classifier error rate, but are supervised in nature. We propose an efficient semi-supervised framework in this paper which uses change detection on classifier confidence to detect concept drifts, and to determine chunk boundaries dynamically. It also addresses concept evolution problem by detecting outliers having strong cohesion among themselves. Experiment results on benchmark and synthetic data sets show effectiveness of the proposed approach.
Weighted A* Algorithms for Unsupervised Feature Selection with Provable Bounds on Suboptimality
Arai, Hiromasa (The University of Texas at Dallas) | Xu, Ke (The University of Texas at Dallas) | Maung, Crystal (The University of Texas at Dallas) | Schweitzer, Haim (The University of Texas at Dallas)
Identifying a small number of features that can represent the data is believed to be NP-hard. Previous approaches exploit algebraic structure and use randomization. We propose an algorithm based on ideas similar to the Weighted A* algorithm in heuristic search. Our experiments show this new algorithm to be more accurate than the current state of the art.
Scaling-Up MAP and Marginal MAP Inference in Markov Logic
Sarkhel, Somdeb (The University of Texas at Dallas)
Markov Logic Networks (MLNs) use a few weighted first-order logic formulas to represent large probabilistic graphical models and are ideally suited for representing both relational and probabilistic knowledge in a wide variety of application domains such as, NLP, computer vision, and robotics. However, inference in them is hard because the graphical models can be extremely large, having millions of variables and features (potentials). Therefore, several lifted inference algorithms that exploit relational structure and operate at the compact first-order level, have been developed in recent years. However, the focus of much of existing research on lifted inference is on marginal inference while algorithms for MAP and marginal MAP inference are far less advanced. The aim of the proposed thesis is to fill this void, by developing next generation inference algorithms for MAP and marginal MAP inference.
On Parameter Tying by Quantization
Chou, Li (The University of Texas at Dallas) | Sarkhel, Somdeb (The University of Texas at Dallas) | Ruozzi, Nicholas (The University of Texas at Dallas) | Gogate, Vibhav (The University of Texas at Dallas)
The maximum likelihood estimator (MLE) is generally asymptotically consistent but is susceptible to over-fitting. To combat this problem, regularization methods which reduce the variance at the cost of (slightly) increasing the bias are often employed in practice. In this paper, we present an alternative variance reduction (regularization) technique that quantizes the MLE estimates as a post processing step, yielding a smoother model having several tied parameters. We provide and prove error bounds for our new technique and demonstrate experimentally that it often yields models having higher test-set log-likelihood than the ones learned using the MLE. We also propose a new importance sampling algorithm for fast approximate inference in models having several tied parameters. Our experiments show that our new inference algorithm is superior to existing approaches such as Gibbs sampling and MC-SAT on models having tied parameters, learned using our quantization-based approach.
Learning Ensembles of Cutset Networks
Rahman, Tahrima (The University of Texas at Dallas) | Gogate, Vibhav (The University of Texas at Dallas)
Cutset networks — OR (decision) trees that have Bayesian networks whose treewidth is bounded by one at each leaf — are a new class of tractable probabilistic models that admit fast, polynomial-time inference and learning algorithms. This is unlike other state-of-the-art tractable models such as thin junction trees, arithmetic circuits and sum-product networks in which inference is fast and efficient but learning can be notoriously slow. In this paper, we take advantage of this unique property to develop fast algorithms for learning ensembles of cutset networks. Specifically, we consider generalized additive mixtures of cutset networks and develop sequential boosting-based and parallel bagging-based approaches for learning them from data. We demonstrate, via a thorough experimental evaluation, that our new algorithms are superior to competing approaches in terms of test-set log-likelihood score and learning time.