AITopics | Representation Of Examples

Locality-sensitive hashing converts high-dimensional feature vectors, such as image and speech, into bit arrays and allows high-speed similarity calculation with the Hamming distance. There is a hashing scheme that maps feature vectors to bit arrays depending on the signs of the inner products between feature vectors and the normal vectors of hyperplanes placed in the feature space. This hashing can be seen as a discretization of the feature space by hyperplanes. If labels for data are given, one can determine the hyperplanes by using learning algorithms. However, many proposed learning methods do not consider the hyperplanes' offsets. Not doing so decreases the number of partitioned regions, and the correlation between Hamming distances and Euclidean distances becomes small. In this paper, we propose a lift map that converts learning algorithms without the offsets to the ones that take into account the offsets. With this method, the learning methods without the offsets give the discretizations of spaces as if it takes into account the offsets. For the proposed method, we input several high-dimensional feature data sets and studied the relationship between the statistical characteristics of data, the number of hyperplanes, and the effect of the proposed method.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Machine Learning

1212.611

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.75)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.75)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.56)

Add feedback

Domain and Function: A Dual-Space Model of Semantic Relations and Compositions

Turney, P. D.

Journal of Artificial Intelligence ResearchJul-31-2012

Given appropriate representations of the semantic relations between carpenter and wood and between mason and stone (for example, vectors in a vector space model), a suitable algorithm should be able to recognize that these relations are highly similar (carpenter is to wood as mason is to stone; the relations are analogous). Likewise, with representations of dog, house, and kennel, an algorithm should be able to recognize that the semantic composition of dog and house, dog house, is highly similar to kennel (dog house and kennel are synonymous). It seems that these two tasks, recognizing relations and compositions, are closely connected. However, up to now, the best models for relations are significantly different from the best models for compositions. In this paper, we introduce a dual-space model that unifies these two tasks. This model matches the performance of the best previous models for relations and compositions. The dual-space model consists of a space for measuring domain similarity and a space for measuring function similarity. Carpenter and wood share the same domain, the domain of carpentry. Mason and stone share the same domain, the domain of masonry. Carpenter and mason share the same function, the function of artisans. Wood and stone share the same function, the function of materials. In the composition dog house, kennel has some domain overlap with both dog and house (the domains of pets and buildings). The function of kennel is similar to the function of house (the function of shelters). By combining domain and function similarities in various ways, we can model relations, compositions, and other aspects of semantics.

dual-space model, similarity, vector, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.3640

AI Access Foundation

10774

Journal of Artificial Intelligence Research

Country:

North America > United States > Ohio > Franklin County > Columbus (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
(30 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.92)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Predicting Crowd-Based Translation Quality with Language-Independent Feature Vectors

Runge, Nina (University of Bremen) | Kilian, Niklas (University of Bremen) | Smeddinck, Jan (University of Bremen) | Krause, Markus (University of Bremen)

AAAI ConferencesJul-21-2012

Research over the past years has shown that machine translation results can be greatly enhanced with the help of mono- or bilingual human contributors, e.g. by asking hu- mans to proofread or correct outputs of machine translation systems. However, it remains difficult to determine the quality of individual revisions. This paper proposes a meth- od to determine the quality of individual contributions by analyzing task-independent data. Examples of such data are completion time, number of keystrokes, etc. An initial evaluation showed promising F-measure values larger than 0.8 for support vector machine and decision tree based classifications of a combined test set of Vietnamese and German translations.

machine learning, natural language, translation, (17 more...)

AAAI Conferences

Workshops at the Twenty-Sixth AAAI Conference on Artificial Intelligence

Country:

Europe > Germany > Bremen > Bremen (0.05)
Asia > Vietnam > Hanoi > Hanoi (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.43)

Add feedback

Reasoning about Uncertainty in Metric Spaces

Lee, Seunghwan

arXiv.org Artificial IntelligenceJun-27-2012

We set up a model for reasoning about metric spaces with belief theoretic measures. The uncertainty in these spaces stems from both probability and metric. To represent both aspect of uncertainty, we choose an expected distance function as a measure of uncertainty. A formal logical system is constructed for the reasoning about expected distance. Soundness and completeness are shown for this logic. For reasoning on product metric space with uncertainty, a new metric is defined and shown to have good properties.

artificial intelligence, distance function, machine learning, (15 more...)

arXiv.org Artificial Intelligence

1206.6856

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > United States > Indiana > Monroe County > Bloomington (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.84)

Add feedback

On the Difficulty of Nearest Neighbor Search

He, Junfeng, Kumar, Sanjiv, Chang, Shih-Fu

arXiv.org Machine LearningJun-27-2012

Fast approximate nearest neighbor (NN) search in large databases is becoming popular. Several powerful learning-based formulations have been proposed recently. However, not much attention has been paid to a more fundamental question: how difficult is (approximate) nearest neighbor search in a given data set? And which data properties affect the difficulty of nearest neighbor search and how? This paper introduces the first concrete measure called Relative Contrast that can be used to evaluate the influence of several crucial data characteristics such as dimensionality, sparsity, and database size simultaneously in arbitrary normed metric spaces. Moreover, we present a theoretical analysis to prove how the difficulty measure (relative contrast) determines/affects the complexity of Local Sensitive Hashing, a popular approximate NN search method. Relative contrast also provides an explanation for a family of heuristic hashing algorithms with good practical performance based on PCA. Finally, we show that most of the previous works in measuring NN search meaningfulness/difficulty can be derived as special asymptotic cases for dense vectors of the proposed measure.

information retrieval, machine learning, natural language, (15 more...)

arXiv.org Machine Learning

1206.6411

Country: North America > United States (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Case-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Multi-armed bandits on implicit metric spaces

Slivkins, Aleksandrs

Neural Information Processing SystemsDec-31-2011

The multi-armed bandit (MAB) setting is a useful abstraction of many online learning tasks which focuses on the trade-off between exploration and exploitation. In this setting, an online algorithm has a fixed set of alternatives ("arms"), and in each round it selects one arm and then observes the corresponding reward. While the case of small number of arms is by now well-understood, a lot of recent work has focused on multi-armed bandits with (infinitely) many arms, where one needs to assume extra structure in order to make the problem tractable. In particular, in the Lipschitz MAB problem there is an underlying similarity metric space, known to the algorithm, such that any two arms that are close in this metric space have similar payoffs. In this paper we consider the more realistic scenario in which the metric space is *implicit* -- it is defined by the available structure but not revealed to the algorithm directly. Specifically, we assume that an algorithm is given a tree-based classification of arms. For any given problem instance such a classification implicitly defines a similarity metric space, but the numerical similarity information is not available to the algorithm. We provide an algorithm for this setting, whose performance guarantees (almost) match the best known guarantees for the corresponding instance of the Lipschitz MAB problem.

algorithm, big data, upstream oil & gas, (20 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.14)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Data Science > Data Mining > Big Data (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (1.00)

Add feedback

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

Socher, Richard, Huang, Eric H., Pennin, Jeffrey, Manning, Christopher D., Ng, Andrew Y.

Neural Information Processing SystemsDec-31-2011

Paraphrase detection is the task of examining two sentences and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two statements is needed. We introduce a method for paraphrase detection based on recursive autoencoders (RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn feature vectors for phrases in syntactic trees. These features are used to measure the word-and phrase-wise similarity between two sentences. Since sentences may be of arbitrary length, the resulting matrix of similarity measures is of variable size. We introduce a novel dynamic pooling layer which computes a fixed-sized representation from the variable-sized matrices. The pooled representation is then used as input to a classifier. Our method outperforms other state-of-the-art approaches on the challenging MSRP paraphrase corpus.

information, representation, similarity matrix, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
Asia > Malaysia (0.04)
(4 more...)

Genre: Research Report (0.66)

Industry: Government (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

Socher, Richard, Huang, Eric H., Pennin, Jeffrey, Manning, Christopher D., Ng, Andrew Y.

Neural Information Processing SystemsDec-31-2011

Paraphrase detection is the task of examining two sentences and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two statements is needed. We introduce a method for paraphrase detection based on recursive autoencoders (RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn feature vectors for phrases in syntactic trees. These features are used to measure the word-and phrase-wise similarity between two sentences. Since sentences may be of arbitrary length, the resulting matrix of similarity measures is of variable size. We introduce a novel dynamic pooling layer which computes a fixed-sized representation from the variable-sized matrices. The pooled representation is then used as input to a classifier. Our method outperforms other state-of-the-art approaches on the challenging MSRP paraphrase corpus.

information, representation, similarity matrix, (12 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > United Kingdom > England > Greater London > London > Wimbledon (0.04)
Asia > Malaysia (0.04)
(4 more...)

Genre: Research Report (0.66)

Industry: Government (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection

Socher, Richard, Huang, Eric H., Pennin, Jeffrey, Manning, Christopher D., Ng, Andrew Y.

Neural Information Processing SystemsDec-31-2011

Paraphrase detection is the task of examining two sentences and determining whether they have the same meaning. In order to obtain high accuracy on this task, thorough syntactic and semantic analysis of the two statements is needed. We introduce a method for paraphrase detection based on recursive autoencoders (RAE). Our unsupervised RAEs are based on a novel unfolding objective and learn feature vectors for phrases in syntactic trees. These features are used to measure the word-and phrase-wise similarity between two sentences. Since sentences may be of arbitrary length, the resulting matrix of similarity measures is of variable size. We introduce a novel dynamic pooling layer which computes a fixed-sized representation from the variable-sized matrices. The pooled representation is then used as input to a classifier. Our method outperforms other state-of-the-art approaches onthe challenging MSRP paraphrase corpus.

artificial intelligence, machine learning, representation, (15 more...)

Neural Information Processing Systems

Country: North America > United States (0.47)

Genre: Research Report (0.66)

Industry: Government (0.70)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.34)

Add feedback

Hodge Theory on Metric Spaces

Bartholdi, Laurent, Schick, Thomas, Smale, Nat, Smale, Steve, Baker, Anthony W.

arXiv.org Machine LearningNov-24-2011

Hodge theory is a beautiful synthesis of geometry, topology, and analysis, which has been developed in the setting of Riemannian manifolds. On the other hand, spaces of images, which are important in the mathematical foundations of vision and pattern recognition, do not fit this framework. This motivates us to develop a version of Hodge theory on metric spaces with a probability measure. We believe that this constitutes a step towards understanding the geometry of vision. The appendix by Anthony Baker provides a separable, compact metric space with infinite dimensional \alpha-scale homology.

artificial intelligence, cohomology, machine learning, (14 more...)

arXiv.org Machine Learning

doi: 10.1007/s10208-011-9107-3

0912.0284

Country: