AITopics

0812.5032

Country:

North America > United States > New York > New York County > New York City (0.04)
Oceania > Australia (0.04)
North America > United States > Wisconsin (0.04)
(10 more...)

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Lelu, Alain, Cadot, Martine, Cuxac, Pascal

Document stream clustering: experimenting an incremental algorithm and AR-based tools for highlighting dynamic trends

arXiv.org Artificial IntelligenceNov-3-2008

We address here two major challenges presented by dynamic data mining: 1) the stability challenge: we have implemented a rigorous incremental density-based clustering algorithm, independent from any initial conditions and ordering of the data-vectors stream, 2) the cognitive challenge: we have implemented a stringent selection process of association rules between clusters at time t-1 and time t for directly generating the main conclusions about the dynamics of a data-stream. We illustrate these points with an application to a two years and 2600 documents scientific information database.

artificial intelligence, itemset, upstream oil & gas, (15 more...)

0811.0340

Country:

Europe > France (0.28)
North America > United States > California > San Mateo County > Menlo Park (0.14)

Genre: Research Report (0.64)

Industry: Energy > Oil & Gas > Upstream (0.47)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

arXiv.org Artificial IntelligenceOct-30-2008

A Novel Clustering Algorithm Based on a Modified Model of Random Walk

Li, Qiang, He, Yan, Jiang, Jing-ping

We introduce a modified model of random walk, and then develop two novel clustering algorithms based on it. In the algorithms, each data point in a dataset is considered as a particle which can move at random in space according to the preset rules in the modified model. Further, this data point may be also viewed as a local control subsystem, in which the controller adjusts its transition probability vector in terms of the feedbacks of all data points, and then its transition direction is identified by an event-generating function. Finally, the positions of all data points are updated. As they move in space, data points collect gradually and some separating parts emerge among them automatically. As a consequence, data points that belong to the same class are located at a same position, whereas those that belong to different classes are away from one another. Moreover, the experimental results have demonstrated that data points in the test datasets are clustered reasonably and efficiently, and the comparison with other algorithms also provides an indication of the effectiveness of the proposed algorithms.

artificial intelligence, data mining, machine learning, (14 more...)

0810.5484

Country: Asia > China (0.04)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Data Science > Data Mining (0.99)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.74)

De Gregorio, Alessandro, Iacus, Stefano Maria

Clustering of discretely observed diffusion processes

arXiv.org Machine LearningSep-23-2008

In this paper a new dissimilarity measure to identify groups of assets dynamics is proposed. The underlying generating process is assumed to be a diffusion process solution of stochastic differential equations and observed at discrete time. The mesh of observations is not required to shrink to zero. As distance between two observed paths, the quadratic distance of the corresponding estimated Markov operators is considered. Analysis of both synthetic data and real financial data from NYSE/NASDAQ stocks, give evidence that this distance seems capable to catch differences in both the drift and diffusion coefficients contrary to other commonly used metrics.

artificial intelligence, banking & finance, machine learning, (16 more...)

0809.3902

Country:

Europe > Italy (0.28)
North America (0.14)
Europe > Germany (0.14)

Genre: Research Report (0.40)

Industry:

Banking & Finance > Trading (0.54)
Energy > Oil & Gas (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Vitanyi, Paul M. B., Balbach, Frank J., Cilibrasi, Rudi L., Li, Ming

Normalized Information Distance

arXiv.org Artificial IntelligenceSep-15-2008

The normalized information distance is a universal distance measure for objects of all kinds. It is based on Kolmogorov complexity and thus uncomputable, but there are ways to utilize it. First, compression algorithms can be used to approximate the Kolmogorov complexity if the objects have a string representation. Second, for names and abstract concepts, page count statistics from the World Wide Web can be used. These practical realizations of the normalized information distance can then be applied to machine learning tasks, expecially clustering, to perform feature-free and parameter-free data mining. This chapter discusses the theoretical foundations of the normalized information distance and both practical realizations. It presents numerous examples of successful real-world applications based on these distance measures, ranging from bioinformatics to music clustering to machine translation.

data mining, machine learning, natural language, (19 more...)

0809.2553

Country:

North America > United States > Washington > King County > Seattle (0.14)
North America > United States > Washington > King County > Bellevue (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(8 more...)

Genre: Research Report (0.40)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

arXiv.org Machine LearningSep-2-2008

From Data to the p-Adic or Ultrametric Model

Murtagh, Fionn

We model anomaly and change in data by embedding the data in an ultrametric space. Taking our initial data as cross-tabulation counts (or other input data formats), Correspondence Analysis allows us to endow the information space with a Euclidean metric. We then model anomaly or change by an induced ultrametric. The induced ultrametric that we are particularly interested in takes a sequential - e.g. temporal - ordering of the data into account. We apply this work to the flow of narrative expressed in the film script of the Casablanca movie; and to the evolution between 1988 and 2004 of the Colombian social conflict and violence.

artificial intelligence, correspondence analysis, machine learning, (19 more...)

doi: 10.1134/S2070046609010063

0809.0492

Country:

Europe > United Kingdom (0.28)
Africa > Middle East > Morocco > Casablanca-Settat Region > Casablanca (0.26)

Genre: Research Report (0.40)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.31)

Puolamäki, Kai, Hanhijärvi, Sami, Garriga, Gemma C.

An Approximation Ratio for Biclustering

arXiv.org Machine LearningAug-22-2008

The problem of biclustering consists of the simultaneous clustering of rows and columns of a matrix such that each of the submatrices induced by a pair of row and column clusters is as uniform as possible. In this paper we approximate the optimal biclustering by applying one-way clustering algorithms independently on the rows and on the columns of the input matrix. We show that such a solution yields a worst-case approximation ratio of 1+sqrt(2) under L1-norm for 0-1 valued matrices, and of 2 under L2-norm for real valued matrices.

algorithm, artificial intelligence, machine learning, (18 more...)

doi: 10.1016/j.ipl.2008.03.013

0712.2682

Country: Europe > Finland (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

arXiv.org Machine LearningJul-16-2008

Text Data Mining: Theory and Methods

Solka, Jeffrey

This paper provides the reader with a very brief introduction to some of the theory and methods of text data mining. The intent of this article is to introduce the reader to some of the current methodologies that are employed within this discipline area while at the same time making the reader aware of some of the interesting challenges that remain to be solved within the area. Finally, the articles serves as a very rudimentary tutorial on some of techniques while also providing the reader with a list of references for additional study.

artificial intelligence, machine learning, natural language, (13 more...)

doi: 10.1214/07-SS016

0807.2569

Country: North America > United States > New York (0.15)

Genre:

Research Report (0.64)
Instructional Material > Course Syllabus & Notes (0.34)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(4 more...)

Murtagh, Fionn, Ganz, Adam, McKie, Stewart

The Structure of Narrative: the Case of Film Scripts

arXiv.org Artificial IntelligenceMay-24-2008

We analyze the style and structure of story narrative using the case of film scripts. The practical importance of this is noted, especially the need to have support tools for television movie writing. We use the Casablanca film script, and scripts from six episodes of CSI (Crime Scene Investigation). For analysis of style and structure, we quantify various central perspectives discussed in McKee's book, "Story: Substance, Structure, Style, and the Principles of Screenwriting". Film scripts offer a useful point of departure for exploration of the analysis of more general narratives. Our methodology, using Correspondence Analysis, and hierarchical clustering, is innovative in a range of areas that we discuss. In particular this work is groundbreaking in taking the qualitative analysis of McKee and grounding this analysis in a quantitative and algorithmic framework.

correspondence analysis, machine learning, natural language, (19 more...)

doi: 10.1016/j.patcog.2008.05.026

0805.3799

Country:

North America > United States (0.93)
Africa > Middle East > Morocco > Casablanca-Settat Region > Casablanca (0.26)

Genre: Research Report (0.50)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Communications (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.90)

arXiv.org Artificial IntelligenceMay-8-2008

Adaptive Affinity Propagation Clustering

Wang, Kaijun, Zhang, Junying, Li, Dan, Zhang, Xinna, Guo, Tao

Affinity propagation clustering (AP) has two limitations: it is hard to know what value of parameter 'preference' can yield an optimal clustering solution, and oscillations cannot be eliminated automatically if occur. The adaptive AP method is proposed to overcome these limitations, including adaptive scanning of preferences to search space of the number of clusters for finding the optimal clustering solution, adaptive adjustment of damping factors to eliminate oscillations, and adaptive escaping from oscillations when the damping adjustment technique fails. Experimental results on simulated and real data sets show that the adaptive AP is effective and can outperform AP in quality of clustering results.

artificial intelligence, machine learning, oscillation, (15 more...)

0805.1096

Country:

North America > United States (0.46)
Asia > China (0.29)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)