Government
Efficient Kernel Discriminant Analysis via QR Decomposition
Xiong, Tao, Ye, Jieping, Li, Qi, Janardan, Ravi, Cherkassky, Vladimir
Linear Discriminant Analysis (LDA) is a well-known method for feature extraction and dimension reduction. It has been used widely in many applications such as face recognition. Recently, a novel LDA algorithm based on QR Decomposition, namely LDA/QR, has been proposed, which is competitive in terms of classification accuracy with other LDA algorithms, but it has much lower costs in time and space. However, LDA/QR is based on linear projection, which may not be suitable for data with nonlinear structure. This paper first proposes an algorithm called KDA/QR, which extends the LDA/QR algorithm to deal with nonlinear data by using the kernel operator. Then an efficient approximation of KDA/QR called AKDA/QR is proposed. Experiments on face image data show that the classification accuracy of both KDA/QR and AKDA/QR are competitive with Generalized Discriminant Analysis (GDA), a general kernel discriminant analysis algorithm, while AKDA/QR has much lower time and space costs.
Learning Syntactic Patterns for Automatic Hypernym Discovery
Snow, Rion, Jurafsky, Daniel, Ng, Andrew Y.
Semantic taxonomies such as WordNet provide a rich source of knowledge for natural language processing applications, but are expensive to build, maintain, and extend. Motivated by the problem of automatically constructing and extending such taxonomies, in this paper we present a new algorithm for automatically learning hypernym (isa) relations from text. Our method generalizes earlier work that had relied on using small numbers of handcrafted regular expression patterns to identify hypernym pairs. Using "dependency path" features extracted from parse trees, we introduce a general-purpose formalization and generalization of these patterns. Given a training set of text containing known hypernym pairs, our algorithm automatically extracts useful dependency paths and applies them to new corpora to identify novel pairs. On our evaluation task (determining whether two nouns in a news article participate in a hypernym relationship), our automatically extracted database of hypernyms attains both higher precision and higher recall than WordNet.
Conditional Models of Identity Uncertainty with Application to Noun Coreference
McCallum, Andrew, Wellner, Ben
Coreference analysis, also known as record linkage or identity uncertainty, is a difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces several discriminative, conditional-probability models for coreference analysis, all examples of undirected graphical models. Unlike many historical approaches to coreference, the models presented here are relational--they do not assume that pairwise coreference decisions should be made independently from each other. Unlike other relational models of coreference that are generative, the conditional model here can incorporate a great variety of features of the input without having to be concerned about their dependencies--paralleling the advantages of conditional random fields over hidden Markov models.
Who's In the Picture
Berg, Tamara L., Berg, Alexander C., Edwards, Jaety, Forsyth, David A.
The context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We obtain 44,773 face images, using a face detector, from approximately half a million captioned news images and automatically link names, obtained using a named entity recognizer, with these faces. A simple clustering method can produce fair results. We improve these results significantly by combining the clustering process with a model of the probability that an individual is depicted given its context. Once the labeling procedure is over, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce accurate results on captions in isolation.
Economic Properties of Social Networks
Kakade, Sham M., Kearns, Michael, Ortiz, Luis E., Pemantle, Robin, Suri, Siddharth
We examine the marriage of recent probabilistic generative models for social networks with classical frameworks from mathematical economics. Weare particularly interested in how the statistical structure of such networks influences global economic quantities such as price variation. Ourfindings are a mixture of formal analysis, simulation, and experiments on an international trade data set from the United Nations.
Two-Dimensional Linear Discriminant Analysis
Ye, Jieping, Janardan, Ravi, Li, Qi
Linear Discriminant Analysis (LDA) is a well-known scheme for feature extraction and dimension reduction. It has been used widely in many applications involvinghigh-dimensional data, such as face recognition and image retrieval. An intrinsic limitation of classical LDA is the so-called singularity problem, that is, it fails when all scatter matrices are singular. Awell-known approach to deal with the singularity problem is to apply an intermediate dimension reduction stage using Principal Component Analysis(PCA) before LDA. The algorithm, called PCA LDA, is used widely in face recognition. However, PCA LDA has high costs in time and space, due to the need for an eigen-decomposition involving the scatter matrices. In this paper, we propose a novel LDA algorithm, namely 2DLDA, which stands for 2-Dimensional Linear Discriminant Analysis.
Learning Syntactic Patterns for Automatic Hypernym Discovery
Snow, Rion, Jurafsky, Daniel, Ng, Andrew Y.
Semantic taxonomies such as WordNet provide a rich source of knowledge fornatural language processing applications, but are expensive to build, maintain, and extend. Motivated by the problem of automatically constructing and extending such taxonomies, in this paper we present a new algorithm for automatically learning hypernym (is-a) relations from text. Our method generalizes earlier work that had relied on using small numbers of handcrafted regular expression patterns to identify hypernym pairs.Using "dependency path" features extracted from parse trees, we introduce a general-purpose formalization and generalization of these patterns. Given a training set of text containing known hypernym pairs, our algorithm automatically extracts useful dependency paths and applies them to new corpora to identify novel pairs. On our evaluation task (determining whethertwo nouns in a news article participate in a hypernym relationship), our automatically extracted database of hypernyms attains both higher precision and higher recall than WordNet.
Conditional Models of Identity Uncertainty with Application to Noun Coreference
McCallum, Andrew, Wellner, Ben
Coreference analysis, also known as record linkage or identity uncertainty, isa difficult and important problem in natural language processing, databases, citation matching and many other tasks. This paper introduces severaldiscriminative, conditional-probability models for coreference analysis,all examples of undirected graphical models. Unlike many historical approaches to coreference, the models presented here are relational--they do not assume that pairwise coreference decisions should be made independently from each other. Unlike other relational models of coreference that are generative, the conditional model here can incorporate a great variety of features of the input without having to be concerned about their dependencies--paralleling the advantages of conditional randomfields over hidden Markov models.
Who's In the Picture
Berg, Tamara L., Berg, Alexander C., Edwards, Jaety, Forsyth, David A.
The context in which a name appears in a caption provides powerful cues as to who is depicted in the associated image. We obtain 44,773 face images, usinga face detector, from approximately half a million captioned news images and automatically link names, obtained using a named entity recognizer,with these faces. A simple clustering method can produce fairresults. We improve these results significantly by combining the clustering process with a model of the probability that an individual is depicted given its context. Once the labeling procedure is over, we have an accurately labeled set of faces, an appearance model for each individual depicted, and a natural language model that can produce accurate resultson captions in isolation.