AITopics | Data Mining

Collaborating Authors

Data Mining

Computers have become adept at extracting patterns from very large collections of data. For example, shopping transactions can reveal consumers' preferences and message traffic on social networks can reveal political trends.

News Overviews Instructional Materials AI-Alerts Classics

Quality Classifiers for Open Source Software Repositories

Tsatsaronis, George, Halkidi, Maria, Giakoumakis, Emmanouel A.

arXiv.org Artificial IntelligenceApr-29-2009

Open Source Software (OSS) often relies on large repositories, like SourceForge, for initial incubation. The OSS repositories offer a large variety of meta-data providing interesting information about projects and their success. In this paper we propose a data mining approach for training classifiers on the OSS meta-data provided by such data repositories. The classifiers learn to predict the successful continuation of an OSS project. The `successfulness' of projects is defined in terms of the classifier confidence with which it predicts that they could be ported in popular OSS projects (such as FreeBSD, Gentoo Portage).

artificial intelligence, classifier, data mining, (15 more...)

arXiv.org Artificial Intelligence

0904.4708

Technology:

Information Technology > Software (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(2 more...)

Add feedback

Non-Negative Matrix Factorization, Convexity and Isometry

Vasiloglou, Nikolaos, Gray, Alexander G., Anderson, David V.

arXiv.org Artificial IntelligenceApr-22-2009

In this paper we explore avenues for improving the reliability of dimensionality reduction methods such as Non-Negative Matrix Factorization (NMF) as interpretive exploratory data analysis tools. We first explore the difficulties of the optimization problem underlying NMF, showing for the first time that non-trivial NMF solutions always exist and that the optimization problem is actually convex, by using the theory of Completely Positive Factorization. We subsequently explore four novel approaches to finding globally-optimal NMF solutions using various ideas from convex optimization. We then develop a new method, isometric NMF (isoNMF), which preserves non-negativity while also providing an isometric embedding, simultaneously achieving two properties which are helpful for interpretation. Though it results in a more difficult optimization problem, we show experimentally that the resulting method is scalable and even achieves more compact spectra than standard NMF.

artificial intelligence, matrix, optimization problem, (16 more...)

arXiv.org Artificial Intelligence

0810.2311

Country: North America > United States > New York (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Using Association Rules for Better Treatment of Missing Values

Bashir, Shariq, Razzaq, Saad, Maqbool, Umer, Tahir, Sonya, Baig, Abdul Rauf

arXiv.org Artificial IntelligenceApr-21-2009

The quality of training data for knowledge discovery in databases (KDD) and data mining depends upon many factors, but handling missing values is considered to be a crucial factor in overall data quality. Today real world datasets contains missing values due to human, operational error, hardware malfunctioning and many other factors. The quality of knowledge extracted, learning and decision problems depend directly upon the quality of training data. By considering the importance of handling missing values in KDD and data mining tasks, in this paper we propose a novel Hybrid Missing values Imputation Technique (HMiT) using association rules mining and hybrid combination of k-nearest neighbor approach. To check the effectiveness of our HMiT missing values imputation technique, we also perform detail experimental results on real world datasets. Our results suggest that the HMiT technique is not only better in term of accuracy but it also take less processing time as compared to current best missing values imputation technique based on k-nearest neighbor approach, which shows the effectiveness of our missing values imputation technique.

artificial intelligence, association rule, data mining, (16 more...)

arXiv.org Artificial Intelligence

0904.3320

Country:

Europe > Italy (0.14)
Asia > Pakistan (0.14)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Nearest Neighbor Methods (1.00)

Add feedback

Symbolic Computing with Incremental Mindmaps to Manage and Mine Data Streams - Some Applications

Brucks, Claudine, Hilker, Michael, Schommer, Christoph, Wagner, Cynthia, Weires, Ralph

arXiv.org Artificial IntelligenceFeb-18-2009

In our understanding, a mind-map is an adaptive engine that basically works incrementally on the fundament of existing transactional streams. Generally, mind-maps consist of symbolic cells that are connected with each other and that become either stronger or weaker depending on the transactional stream. Based on the underlying biologic principle, these symbolic cells and their connections as well may adaptively survive or die, forming different cell agglomerates of arbitrary size. In this work, we intend to prove mind-maps' eligibility following diverse application scenarios, for example being an underlying management system to represent normal and abnormal traffic behaviour in computer networks, supporting the detection of the user behaviour within search engines, or being a hidden communication layer for natural language interaction.

artificial intelligence, information management, natural language, (20 more...)

arXiv.org Artificial Intelligence

0902.3196

Country:

Europe (0.69)
North America > United States > California (0.14)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Topological Centrality and Its Applications

Zhuge, Hai, Zhang, Junsheng

arXiv.org Artificial IntelligenceFeb-11-2009

Recent development of network structure analysis shows that it plays an important role in characterizing complex system of many branches of sciences. Different from previous network centrality measures, this paper proposes the notion of topological centrality (TC) reflecting the topological positions of nodes and edges in general networks, and proposes an approach to calculating the topological centrality. The proposed topological centrality is then used to discover communities and build the backbone network. Experiments and applications on research network show the significance of the proposed approach.

artificial intelligence, information management, node, (19 more...)

arXiv.org Artificial Intelligence

0902.1911

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications (0.94)
Information Technology > Data Science > Data Mining (0.68)
(2 more...)

Add feedback

A Model for Managing Collections of Patterns

Jeudy, Baptiste, Largeron, Christine, Jacquenet, François

arXiv.org Artificial IntelligenceFeb-6-2009

Data mining algorithms are now able to efficiently deal with huge amount of data. Various kinds of patterns may be discovered and may have some great impact on the general development of knowledge. In many domains, end users may want to have their data mined by data mining tools in order to extract patterns that could impact their business. Nevertheless, those users are often overwhelmed by the large quantity of patterns extracted in such a situation. Moreover, some privacy issues, or some commercial one may lead the users not to be able to mine the data by themselves. Thus, the users may not have the possibility to perform many experiments integrating various constraints in order to focus on specific patterns they would like to extract. Post processing of patterns may be an answer to that drawback. Thus, in this paper we present a framework that could allow end users to manage collections of patterns. We propose to use an efficient data structure on which some algebraic operators may be used in order to retrieve or access patterns in pattern bases.

artificial intelligence, health & medicine, projection, (21 more...)

arXiv.org Artificial Intelligence

0902.1080

Country: Europe (0.14)

Industry:

Information Technology > Security & Privacy (0.54)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.46)

Add feedback

Reflective visualization and verbalization of unconscious preference

Maeno, Yoshiharu, Ohsawa, Yukio

arXiv.org Artificial IntelligenceFeb-2-2009

A new method is presented, that can help a person become aware of his or her unconscious preferences, and convey them to others in the form of verbal explanation. The method combines the concepts of reflection, visualization, and verbalization. The method was tested in an experiment where the unconscious preferences of the subjects for various artworks were investigated. In the experiment, two lessons were learned. The first is that it helps the subjects become aware of their unconscious preferences to verbalize weak preferences as compared with strong preferences through discussion over preference diagrams. The second is that it is effective to introduce an adjustable factor into visualization to adapt to the differences in the subjects and to foster their mutual understanding.

artificial intelligence, diagram, educational setting, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1504/IJAIP.2010.030531

0803.4074

Country:

North America > United States (0.28)
Asia > Japan > Honshū > Kantō (0.16)

Industry: Education (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.69)
Information Technology > Data Science > Data Mining (0.68)

Add feedback

On Introspection, Metacognitive Control and Augmented Data Mining Live Cycles

Sonntag, Daniel

arXiv.org Artificial IntelligenceJan-15-2009

We discuss metacognitive modelling as an enhancement to cognitive modelling and computing. Metacognitive control mechanisms should enable AI systems to self-reflect, reason about their actions, and to adapt to new situations. In this respect, we propose implementation details of a knowledge taxonomy and an augmented data mining life cycle which supports a live integration of obtained models.

artificial intelligence, knowledge, neural network, (18 more...)

arXiv.org Artificial Intelligence

0807.4417

Country: North America > United States > California (0.28)

Industry: Education (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.89)
(2 more...)

Add feedback

Differential Privacy with Compression

Zhou, Shuheng, Ligett, Katrina, Wasserman, Larry

arXiv.org Machine LearningJan-10-2009

This work studies formal utility and privacy guarantees for a simple multiplicative database transformation, where the data are compressed by a random linear or affine transformation, reducing the number of data records substantially, while preserving the number of original input variables. We provide an analysis framework inspired by a recent concept known as differential privacy (Dwork 06). Our goal is to show that, despite the general difficulty of achieving the differential privacy guarantee, it is possible to publish synthetic data that are useful for a number of common statistical learning applications. This includes high dimensional sparse regression (Zhou et al. 07), principal component analysis (PCA), and other statistical measures (Liu et al. 06) based on the covariance of the initial data.

artificial intelligence, machine learning, privacy, (17 more...)

arXiv.org Machine Learning

0901.1365

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)

Genre: Research Report (0.50)

Industry: Information Technology > Security & Privacy (0.68)

Technology:

Information Technology > Data Science > Data Mining (0.94)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.54)

Add feedback

Heterogeneous Component Analysis

Oba, Shigeyuki, Kawanabe, Motoaki, Müller, Klaus-Robert, Ishii, Shin

Neural Information Processing SystemsDec-31-2008

In bioinformatics it is often desirable to combine data from various measurement sources and thus structured feature vectors are to be analyzed that possess different intrinsic blocking characteristics (e.g., different patterns of missing values, observation noiselevels, effective intrinsic dimensionalities). We propose a new machine learning tool, heterogeneous component analysis (HCA), for feature extraction in order to better understand the factors that underlie such complex structured heterogeneous data. HCA is a linear block-wise sparse Bayesian PCA based not only on a probabilistic model with block-wise residual variance terms but also on a Bayesian treatment of a block-wise sparse factor-loading matrix. We study various algorithmsthat implement our HCA concept extracting sparse heterogeneous structure by obtaining common components for the blocks and specific components withineach block. Simulations on toy and bioinformatics data underline the usefulness of the proposed structured matrix factorization concept.

artificial intelligence, health & medicine, matrix, (17 more...)

Neural Information Processing Systems

Country:

Asia > Japan (0.14)
Europe > Germany (0.14)
North America > United States (0.14)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Data Science > Data Mining > Feature Extraction (0.55)

Add feedback