AITopics | Data Mining

Collaborating Authors

Data Mining

Computers have become adept at extracting patterns from very large collections of data. For example, shopping transactions can reveal consumers' preferences and message traffic on social networks can reveal political trends.

News Overviews Instructional Materials AI-Alerts Classics

On the Foundations of Adversarial Single-Class Classification

El-Yaniv, Ran, Nisenson, Mordechai

arXiv.org Artificial IntelligenceOct-21-2010

Motivated by authentication, intrusion and spam detection applications we consider single-class classification (SCC) as a two-person game between the learner and an adversary. In this game the learner has a sample from a target distribution and the goal is to construct a classifier capable of distinguishing observations from the target distribution from observations emitted from an unknown other distribution. The ideal SCC classifier must guarantee a given tolerance for the false-positive error (false alarm rate) while minimizing the false negative error (intruder pass rate). Viewing SCC as a two-person zero-sum game we identify both deterministic and randomized optimal classification strategies for different game variants. We demonstrate that randomized classification can provide a significant advantage. In the deterministic setting we show how to reduce SCC to two-class classification where in the two-class problem the other class is a synthetically generated distribution. We provide an efficient and practical algorithm for constructing and solving the two class problem. The algorithm distinguishes low density regions of the target distribution and is shown to be consistent.

artificial intelligence, data mining, rejection function, (18 more...)

arXiv.org Artificial Intelligence

1010.4466

Country: Asia > Middle East > Israel (0.14)

Industry:

Leisure & Entertainment > Games (1.00)
Information Technology > Security & Privacy (0.66)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Building Watson: An Overview of the DeepQA Project

AI MagazineOct-10-2010

IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV Quiz show, Jeopardy! The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy! Challenge helped us address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After 3 years of intense research and development by a core team of about 20 researches, Watson is performing at human expert-levels in terms of precision, confidence and speed at the Jeopardy! Quiz show. Our results strongly suggest that DeepQA is an effective and extensible architecture that may be used as a foundation for combining, deploying, evaluating and advancing a wide range of algorithmic techniques to rapidly advance the field of QA.

candidate answer, neural network, us government, (24 more...)

AI Magazine

Country:

Asia (0.93)
North America > United States > California (0.28)
North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment > Games > Jeopardy! (1.00)
Government > Regional Government > North America Government > United States Government (0.92)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
(4 more...)

Add feedback

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Mahoney, Michael W.

arXiv.org Machine LearningOct-8-2010

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.

algorithm, health & medicine, information management, (21 more...)

arXiv.org Machine Learning

1010.1609

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management > Search (0.93)
(4 more...)

Add feedback

A Comprehensive Survey of Data Mining-based Fraud Detection Research

Phua, Clifton, Lee, Vincent, Smith, Kate, Gayler, Ross

arXiv.org Artificial IntelligenceSep-30-2010

This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.

detection, immunology, law enforcement, (23 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.chb.2012.01.002

1009.6119

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(10 more...)

Add feedback

Optimal Bangla Keyboard Layout using Association Rule of Data Mining

Alam, Md. Hijbul, Masum, Abdul Kadar Muhammad, Hassan, Mohammad Mahadi, Kamruzzaman, S. M.

arXiv.org Artificial IntelligenceSep-23-2010

In this paper we present an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. Bangla alphabet has a large number of letters, for this it is difficult to type faster using Bangla keyboard. Our proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Here we use the association rule of data mining to distribute the Bangla characters in the keyboard. First, we analyze the frequencies of data consisting of monograph, digraph and trigraph, which are derived from data wire-house, and then used association rule of data mining to distribute the Bangla characters in the layout. Finally, we propose a Bangla Keyboard Layout. Experimental results on several keyboard layout shows the effectiveness of the proposed approach with better performance.

artificial intelligence, expert system, keyboard layout, (10 more...)

arXiv.org Artificial Intelligence

1009.4586

Country: Asia > Bangladesh (0.16)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.87)

Add feedback

Mining tree-query associations in graphs

Hoekx, Eveline, Bussche, Jan Van den

arXiv.org Artificial IntelligenceAug-16-2010

New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasetsstructured as graphs. We introduce a novel class of tree-shapedpatterns called tree queries, and present algorithms for miningtree queries and tree-query associations in a large data graph. Novel about our class of patterns is that they can containconstants, and can contain existential nodes which are not counted when determining the number of occurrences of the patternin the data graph. Our algorithms have a number of provableoptimality properties, which are based on the theory of conjunctive database queries. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis.

artificial intelligence, data mining, tree pattern, (21 more...)

arXiv.org Artificial Intelligence

1008.2626

Country: Europe (0.14)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.92)

Add feedback

An Agent based Approach towards Metadata Extraction, Modelling and Information Retrieval over the Web

Ahmed, Zeeshan, Gerhard, Detlef

arXiv.org Artificial IntelligenceAug-7-2010

Web development is a challenging research area for its creativity and complexity. The existing raised key challenge in web technology technologic development is the presentation of data in machine read and process able format to take advantage in knowledge based information extraction and maintenance. Currently it is not possible to search and extract optimized results using full text queries because there is no such mechanism exists which can fully extract the semantic from full text queries and then look for particular knowledge based information.

artificial intelligence, natural language, text processing, (11 more...)

arXiv.org Artificial Intelligence

1008.1333

Country: Europe > Austria > Vienna (0.21)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.43)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.41)
Information Technology > Data Science > Data Mining > Text Mining (0.38)

Add feedback

Constraint Programming for Data Mining and Machine Learning

Raedt, Luc De (K. U. Leuven) | Guns, Tias (K. U. Leuven) | Nijssen, Siegfried (K. U. Leuven)

AAAI ConferencesJul-15-2010

Machine learning and data mining have become aware that using constraints when learning patterns and rules can be very useful. To this end, a large number of special purpose systems and techniques have been developed for solving such constraint-based mining and learning problems. These techniques have, so far, been developed independently of the general purpose tools and principles of constraint programming known within the field of artificial intelligence. This paper shows that off-the-shelf constraint programming techniques can be applied to various pattern mining and rule learning problems (cf. also (De Raedt, Guns, and Nijssen 2008; Nijssen, Guns, and De Raedt 2009)). This does not only lead to methodologies that are more general and flexible, but also provides new insights into the underlying mining problems that allow us to improve the state-of-the-art in data mining. Such a combination of constraint programming and data mining raises a number of interesting new questions and challenges.

constraint, constraint-based reasoning, health & medicine, (14 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: Europe > Belgium > Flanders (0.14)

Genre: Research Report (0.46)

Industry:

Health & Medicine (0.46)
Education > Focused Education > Special Education (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning to Surface Deep Web Content

Wu, Zhaohui (Xi'an Jiaotong University) | Jiang, Lu (Xi'an Jiaotong University) | Zheng, Qinghua (Xi'an Jiaotong University) | Liu, Jun (Xi'an Jiaotong University)

AAAI ConferencesJul-15-2010

We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and submits a selected action (query) to the environment according to Q-value. Based on the framework we develop an adaptive crawling method. Experimental results show that it outperforms the state of art methods in crawling capability and breaks through the assumption of full-text search implied by existing methods.

artificial intelligence, information management, q-value, (18 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: Asia > China (0.17)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Information Management > Search (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
Information Technology > Data Science > Data Mining > Web Mining (0.37)

Add feedback

Non-I.I.D. Multi-Instance Dimensionality Reduction by Learning a Maximum Bag Margin Subspace

Ping, Wei (Tsinghua University) | Xu, Ye (Nanjing University) | Ren, Kexin (Nanjing University of Aeronautics and Astronautics) | Chi, Chi-Hung (Tsinghua University) | Shen, Furao (Nanjing University)

AAAI ConferencesJul-15-2010

Multi-instance learning, as other machine learning tasks, also suffers from the curse of dimensionality. Although dimensionality reduction methods have been investigated for many years, multi-instance dimensionality reduction methods remain untouched. On the other hand, most algorithms in multi- instance framework treat instances in each bag as independently and identically distributed samples, which fails to utilize the structure information conveyed by instances in a bag. In this paper, we propose a multi-instance dimensionality reduction method, which treats instances in each bag as non-i.i.d. samples. We regard every bag as a whole entity and define a bag margin objective function. By maximizing the margin of positive and negative bags, we learn a subspace to obtain more salient representation of original data. Experiments demonstrate the effectiveness of the proposed method.

artificial intelligence, data mining, information, (12 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Dimensionality Reduction (1.00)

Add feedback