AITopics

doi: 10.1007/s10994-008-5069-3

0805.2027

Country: North America > United States > Massachusetts (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Data Science > Data Mining > Big Data (0.67)

Gretton, Arthur, Borgwardt, Karsten, Rasch, Malte J., Scholkopf, Bernhard, Smola, Alexander J.

A Kernel Method for the Two-Sample Problem

arXiv.org Artificial IntelligenceMay-15-2008

We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

health & medicine, mmd 2, survey article, (18 more...)

0805.2368

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Germany > Baden-Württemberg (0.28)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Biomedical Informatics (0.93)

arXiv.org Artificial IntelligenceMar-11-2008

Dempster-Shafer for Anomaly Detection

Chen, Qi, Aickelin, Uwe

In this paper, we implement an anomaly detection system using the Dempster-Shafer method. Using two standard benchmark problems we show that by combining multiple signals it is possible to achieve better results than by using a single signal. We further show that by applying this approach to a real-world email dataset the algorithm works for email worm detection. Dempster-Shafer can be a promising method for anomaly detection problems with multiple features (data sources), and two or more classes.

health & medicine, mass value, oncology, (22 more...)

0803.1568

Country:

North America > United States (0.14)
Europe > United Kingdom (0.14)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine (0.94)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Maeno, Yoshiharu, Ohsawa, Yukio

Predicting relevant empty spots in social interaction

arXiv.org Artificial IntelligenceFeb-21-2008

An empty spot refers to an empty hard-to-fill space which can be found in the records of the social interaction, and is the clue to the persons in the underlying social network who do not appear in the records. This contribution addresses a problem to predict relevant empty spots in social interaction. Homogeneous and inhomogeneous networks are studied as a model underlying the social interaction. A heuristic predictor function approach is presented as a new method to address the problem. Simulation experiment is demonstrated over a homogeneous network. A test data in the form of baskets is generated from the simulated communication. Precision to predict the empty spots is calculated to demonstrate the performance of the presented approach.

law enforcement, neural network, node, (22 more...)

0711.1466

Country: Asia > Japan > Honshū > Kantō (0.14)

Industry:

Law Enforcement & Public Safety > Terrorism (0.70)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Boulet, Romain, Jouve, Bertrand, Rossi, Fabrice, Villa, Nathalie

Batch kernel SOM and related Laplacian methods for social network analysis

arXiv.org Machine LearningJan-6-2008

Large graphs are natural mathematical models for describing the structure of the data in a wide variety of fields, such as web mining, social networks, information retrieval, biological networks, etc. For all these applications, automatic tools are required to get a synthetic view of the graph and to reach a good understanding of the underlying problem. In particular, discovering groups of tightly connected vertices and understanding the relations between those groups is very important in practice. This paper shows how a kernel version of the batch Self Organizing Map can be used to achieve these goals via kernels derived from the Laplacian matrix of the graph, especially when it is used in conjunction with more classical methods based on the spectral analysis of the graph. The proposed method is used to explore the structure of a medieval social network modeled through a weighted graph that has been directly built from a large corpus of agrarian contracts.

artificial intelligence, social media, vertex, (20 more...)

arXiv.org Machine Learning

0801.0848

Country:

Europe (1.00)
North America > United States > New Jersey (0.14)

Genre:

Instructional Material (0.46)
Research Report (0.40)

Industry: Information Technology > Services (0.82)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Pandey, Sandeep, Olston, Christopher

Handling Advertisements of Unknown Quality in Search Advertising

We consider how a search engine should select advertisements to display with search results, in order to maximize its revenue. Under the standard "pay-per-click" arrangement, revenue depends on how well the displayed advertisements appeal to users. The main difficulty stems from new advertisements whose degree of appeal has yet to be determined. Often the only reliable way of determining appeal is exploration via display to users, which detracts from exploitation of other advertisements known to have high appeal. Budget constraints and finite advertisement lifetimes make it necessary to explore as well as exploit. In this paper we study the tradeoff between exploration and exploitation, modeling advertisement placement as a multi-armed bandit problem. We extend traditional bandit formulations to account for budget constraints that occur in search engine advertising markets, and derive theoretical bounds on the performance of a family of algorithms.

advertisement, information management, upstream oil & gas, (19 more...)

Industry:

Marketing (1.00)
Information Technology > Services (0.50)
Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.96)

Pandey, Sandeep, Olston, Christopher

Handling Advertisements of Unknown Quality in Search Advertising

We consider how a search engine should select advertisements to display with search results, in order to maximize its revenue. Under the standard "pay-per-click" arrangement, revenue depends on how well the displayed advertisements appeal to users. The main difficulty stems from new advertisements whosedegree of appeal has yet to be determined. Often the only reliable way of determining appeal is exploration via display to users, which detracts from exploitation of other advertisements known to have high appeal. Budget constraints and finite advertisement lifetimes make it necessary to explore as well as exploit. In this paper we study the tradeoff between exploration and exploitation, modeling advertisement placement as a multi-armed bandit problem. We extend traditional bandit formulations to account for budget constraints that occur in search engine advertising markets, and derive theoretical bounds on the performance of a family of algorithms.

advertisement, information management, upstream oil & gas, (18 more...)

Industry:

Marketing (1.00)
Information Technology > Services (0.50)
Energy > Oil & Gas > Upstream (0.35)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.96)

Geometric entropy minimization (GEM) for anomaly detection and localization

Hero, Alfred O.

We introduce a novel adaptive nonparametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Such graphs have the property that as N their span recovers the entropy minimizing set that supports at least ρ K/N(100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance α 1 ρ. A method for implementing this nonparametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.

artificial intelligence, data mining, training sample, (18 more...)

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report > Experimental Study (0.67)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Geometric entropy minimization (GEM) for anomaly detection and localization

Hero, Alfred O.

We introduce a novel adaptive nonparametric anomaly detection approach, called GEM, that is based on the minimal covering properties of K-point entropic graphs when constructed on N training samples from a nominal probability distribution. Such graphs have the property that as N their span recovers the entropy minimizing set that supports at least ρ K/N(100)% of the mass of the Lebesgue part of the distribution. When a test sample falls outside of the entropy minimizing set an anomaly can be declared at a statistical level of significance α 1 ρ. A method for implementing this nonparametric anomaly detector is proposed that approximates this minimum entropy set by the influence region of a K-point entropic graph built on the training data. By implementing an incremental leave-one-out k-nearest neighbor graph on resampled subsets of the training data GEM can efficiently detect outliers at a given level of significance and compute their empirical p-values. We illustrate GEM for several simulated and real data sets in high dimensional feature spaces.

artificial intelligence, data mining, training sample, (18 more...)

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report > Experimental Study (0.67)

Technology:

Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (0.55)

Map-Reduce for Machine Learning on Multicore

Chu, Cheng-tao, Kim, Sang K., Lin, Yi-an, Yu, Yuanyuan, Bradski, Gary, Olukotun, Kunle, Ng, Andrew Y.

We are at the beginning of the multicore era. Computers will have increasingly many cores (processors), but there is still no good programming framework for these architectures, and thus no simple and unified way for machine learning to take advantage of the potential speed up. In this paper, we develop a broadly applicable parallelprogramming method, one that is easily applied to many different learning algorithms. Our work is in distinct contrast to the tradition in machine learning of designing (often ingenious) ways to speed up a single algorithm at a time. Specifically, we show that algorithms that fit the Statistical Query model [15] can be written in a certain "summation form," which allows them to be easily parallelized onmulticore computers.

algorithm, artificial intelligence, data mining, (17 more...)