AITopics

doi: 10.1103/PhysRevC.80.044332

0806.2850

Country:

Europe > United Kingdom (0.28)
Europe > Greece (0.28)
North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

arXiv.org Machine LearningJun-16-2008

Local Procrustes for Manifold Embedding: A Measure of Embedding Quality and Embedding Algorithms

Goldberg, Y., Ritov, Y.

Machine Learning manuscript No. (will be inserted by the editor) Abstract We present the Procrustes measure, a novel measure based on Procrustes rotation that enables quantitative comparison of the output of manifold-based embedding algorithms (such as LLE (Roweis and Saul, 2000) and Isomap (Tenenbaum et al, 2000)). The measure also serves as a natural tool when choosing dimension-reduction parameters. We also present two novel dimension-reduction techniques that attempt to minimize the suggested measure, and compare the results of these techniques to the results of existing algorithms. Finally, we suggest a simple iterative method that can be used to improve the output of existing algorithms. Keywords Dimension reducing · Manifold learning · Procrustes analysis, · Local PCA · Simulated annealing 1 Introduction Technological advances constantly improve our ability to collect and store large sets of data. The main difficulty in analyzing such high-dimensional data sets is, that the number of observations required to estimate functions at a set level of accuracy grows exponentially with the dimension. This problem, often referred to as the curse of dimensionality, has led to various techniques that attempt to reduce the dimension of the original data. Historically, the main approach to dimension reduction is the linear one. This is the approach used by principle component analysis (PCA) and factor analysis (see Mardia et al, 1979, for both).

algorithm, matrix, neighborhood, (16 more...)

0806.2669

Country:

North America > United States > Maryland > Baltimore (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
North America > Canada > Ontario > Toronto (0.04)
Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)

Genre: Research Report (0.50)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning in High Dimensional Spaces (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Chertkov, Michael, Kroc, Lukas, Vergassola, Massimo

Belief Propagation and Beyond for Particle Tracking

arXiv.org Artificial IntelligenceJun-6-2008

We describe a novel approach to statistical learning from particles tracked while moving in a random environment. The problem consists in inferring properties of the environment from recorded snapshots. We consider here the case of a fluid seeded with identical passive particles that diffuse and are advected by a flow. Our approach rests on efficient algorithms to estimate the weighted number of possible matchings among particles in two consecutive snapshots, the partition function of the underlying graphical model. The partition function is then maximized over the model parameters, namely diffusivity and velocity gradient. A Belief Propagation (BP) scheme is the backbone of our algorithm, providing accurate results for the flow parameters we want to learn. The BP estimate is additionally improved by incorporating Loop Series (LS) contributions. For the weighted matching problem, LS is compactly expressed as a Cauchy integral, accurately estimated by a saddle point approximation. Numerical experiments show that the quality of our improved BP algorithm is comparable to the one of a fully polynomial randomized approximation scheme, based on the Markov Chain Monte Carlo (MCMC) method, while the BP-based scheme is substantially faster than the MCMC scheme.

artificial intelligence, machine learning, particle, (15 more...)

0806.1199

Country: North America > United States (0.28)

Genre: Research Report (0.84)

Industry: Energy > Oil & Gas (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Murtagh, Fionn, Ganz, Adam, McKie, Stewart

The Structure of Narrative: the Case of Film Scripts

arXiv.org Artificial IntelligenceMay-24-2008

We analyze the style and structure of story narrative using the case of film scripts. The practical importance of this is noted, especially the need to have support tools for television movie writing. We use the Casablanca film script, and scripts from six episodes of CSI (Crime Scene Investigation). For analysis of style and structure, we quantify various central perspectives discussed in McKee's book, "Story: Substance, Structure, Style, and the Principles of Screenwriting". Film scripts offer a useful point of departure for exploration of the analysis of more general narratives. Our methodology, using Correspondence Analysis, and hierarchical clustering, is innovative in a range of areas that we discuss. In particular this work is groundbreaking in taking the qualitative analysis of McKee and grounding this analysis in a quantitative and algorithmic framework.

correspondence analysis, machine learning, natural language, (19 more...)

doi: 10.1016/j.patcog.2008.05.026

0805.3799

Country:

North America > United States (0.93)
Africa > Middle East > Morocco > Casablanca-Settat Region > Casablanca (0.26)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

Gretton, Arthur, Borgwardt, Karsten, Rasch, Malte J., Scholkopf, Bernhard, Smola, Alexander J.

A Kernel Method for the Two-Sample Problem

arXiv.org Artificial IntelligenceMay-15-2008

We propose a framework for analyzing and comparing distributions, allowing us to design statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS). We present two tests based on large deviation bounds for the test statistic, while a third is based on the asymptotic distribution of this statistic. The test statistic can be computed in quadratic time, although efficient linear time approximations are available. Several classical metrics on distributions are recovered when the function space used to compute the difference in expectations is allowed to be more general (eg. a Banach space). We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

bioinformatics, data mining, machine learning, (21 more...)

0805.2368

Country:

Europe > Germany (1.00)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.46)

Genre:

Research Report (1.00)
Overview (0.92)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (0.93)
Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Biomedical Informatics (0.93)

arXiv.org Artificial IntelligenceMay-8-2008

Adaptive Affinity Propagation Clustering

Wang, Kaijun, Zhang, Junying, Li, Dan, Zhang, Xinna, Guo, Tao

Affinity propagation clustering (AP) has two limitations: it is hard to know what value of parameter 'preference' can yield an optimal clustering solution, and oscillations cannot be eliminated automatically if occur. The adaptive AP method is proposed to overcome these limitations, including adaptive scanning of preferences to search space of the number of clusters for finding the optimal clustering solution, adaptive adjustment of damping factors to eliminate oscillations, and adaptive escaping from oscillations when the damping adjustment technique fails. Experimental results on simulated and real data sets show that the adaptive AP is effective and can outperform AP in quality of clustering results.

artificial intelligence, machine learning, oscillation, (15 more...)

0805.1096

Country:

North America > United States (0.46)
Asia > China (0.29)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

arXiv.org Artificial IntelligenceMay-8-2008

Contact state analysis using NFIS and SOM

Owladeghaffari, H.

In this manner, on a simple system, the evolution of contact states, by parallelization of DDA, h as been investigated. So, a comparison between NFIS and SOM results has been presented. The results show appli cability of the proposed methods, by different accuracy, on detection of contact's distribution.

artificial intelligence, fuzzy logic, machine learning, (9 more...)

0805.1153

Country: Asia > China (0.23)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.96)

Carter, Kevin M., Raich, Raviv, Finn, William G., Hero, Alfred O. III

Information Preserving Component Analysis: Data Projections for Flow Cytometry Analysis

arXiv.org Machine LearningApr-17-2008

Flow cytometry is often used to characterize the malignant cells in leukemia and lymphoma patients, traced to the level of the individual cell. Typically, flow cytometric data analysis is performed through a series of 2-dimensional projections onto the axes of the data set. Through the years, clinicians have determined combinations of different fluorescent markers which generate relatively known expression patterns for specific subtypes of leukemia and lymphoma -- cancers of the hematopoietic system. By only viewing a series of 2-dimensional projections, the high-dimensional nature of the data is rarely exploited. In this paper we present a means of determining a low-dimensional projection which maintains the high-dimensional relationships (i.e. information) between differing oncological data sets. By using machine learning techniques, we allow clinicians to visualize data in a low dimension defined by a linear combination of all of the available markers, rather than just 2 at a time. This provides an aid in diagnosing similar forms of cancer, as well as a means for variable selection in exploratory flow cytometric research. We refer to our method as Information Preserving Component Analysis (IPCA).

artificial intelligence, machine learning, projection matrix, (11 more...)

doi: 10.1109/JSTSP.2008.2011112

0804.2848

Country: North America > United States > Michigan (0.28)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Oncology > Leukemia (0.89)
Health & Medicine > Therapeutic Area > Oncology > Lymphoma (0.74)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

arXiv.org Machine LearningApr-8-2008

On the underestimation of model uncertainty by Bayesian K-nearest neighbors

Su, Wanhua, Chipman, Hugh, Zhu, Mu

When using the K-nearest neighbors method, one often ignores uncertainty in the choice of K. To account for such uncertainty, Holmes and Adams (2002) proposed a Bayesian framework for K-nearest neighbors (KNN). Their Bayesian KNN (BKNN) approach uses a pseudo-likelihood function, and standard Markov chain Monte Carlo (MCMC) techniques to draw posterior samples. Holmes and Adams (2002) focused on the performance of BKNN in terms of misclassification error but did not assess its ability to quantify uncertainty. We present some evidence to show that BKNN still significantly underestimates model uncertainty.

artificial intelligence, machine learning, test point, (17 more...)

0804.1325

Country: North America > Canada > Ontario (0.14)

Genre: Research Report (0.65)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.35)

arXiv.org Artificial IntelligenceApr-8-2008

Geometric Data Analysis, From Correspondence Analysis to Structured Data Analysis (book review)

Murtagh, Fionn

The term "Geometric Data Analysis" is due to Patrick Suppes (Stanford) who writes a Foreword for this encyclopedic view of Correspondence Analysis. The uniqueness of this work lies in the detailed conceptual framework, and in showing how, where and why statistical inference methods come into play.

artificial intelligence, correspondence analysis, machine learning, (16 more...)

doi: 10.1007/s00357-008-9007-7

0804.1244

Country:

North America > United States (0.47)
Europe (0.29)

Genre:

Summary/Review (0.40)
Research Report (0.40)

Industry: Health & Medicine > Therapeutic Area (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)