AITopics | Data Science

Collaborating Authors

Data Science

News Overviews Instructional Materials AI-Alerts Classics

Building Watson: An Overview of the DeepQA Project

AI MagazineOct-10-2010

IBM Research undertook a challenge to build a computer system that could compete at the human champion level in real time on the American TV Quiz show, Jeopardy! The extent of the challenge includes fielding a real-time automatic contestant on the show, not merely a laboratory exercise. The Jeopardy! Challenge helped us address requirements that led to the design of the DeepQA architecture and the implementation of Watson. After 3 years of intense research and development by a core team of about 20 researches, Watson is performing at human expert-levels in terms of precision, confidence and speed at the Jeopardy! Quiz show. Our results strongly suggest that DeepQA is an effective and extensible architecture that may be used as a foundation for combining, deploying, evaluating and advancing a wide range of algorithmic techniques to rapidly advance the field of QA.

candidate answer, neural network, us government, (24 more...)

AI Magazine

Country:

Asia (0.93)
North America > United States > California (0.28)
North America > United States > Maryland (0.28)

Genre: Research Report > New Finding (0.66)

Industry:

Leisure & Entertainment > Games > Jeopardy! (1.00)
Government > Regional Government > North America Government > United States Government (0.92)

Technology:

Information Technology > Knowledge Management (1.00)
Information Technology > Information Management > Search (1.00)
Information Technology > Data Science > Data Mining (1.00)
(4 more...)

Add feedback

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Mahoney, Michael W.

arXiv.org Machine LearningOct-8-2010

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.

algorithm, health & medicine, information management, (21 more...)

arXiv.org Machine Learning

1010.1609

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management > Search (0.93)
(4 more...)

Add feedback

A Comprehensive Survey of Data Mining-based Fraud Detection Research

Phua, Clifton, Lee, Vincent, Smith, Kate, Gayler, Ross

arXiv.org Artificial IntelligenceSep-30-2010

This survey paper categorises, compares, and summarises from almost all published technical and review articles in automated fraud detection within the last 10 years. It defines the professional fraudster, formalises the main types and subtypes of known fraud, and presents the nature of data evidence collected within affected industries. Within the business context of mining the data to achieve higher cost savings, this research presents methods and techniques together with their problems. Compared to all related reviews on fraud detection, this survey covers much more technical articles and is the only one, to the best of our knowledge, which proposes alternative data and solutions from related domains.

detection, immunology, law enforcement, (23 more...)

arXiv.org Artificial Intelligence

doi: 10.1016/j.chb.2012.01.002

1009.6119

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (1.00)
Overview (1.00)
Research Report > New Finding (0.93)

Industry:

Law Enforcement & Public Safety > Fraud (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Information Management (1.00)
Information Technology > Data Science > Data Mining (1.00)
(10 more...)

Add feedback

Optimal Bangla Keyboard Layout using Association Rule of Data Mining

Alam, Md. Hijbul, Masum, Abdul Kadar Muhammad, Hassan, Mohammad Mahadi, Kamruzzaman, S. M.

arXiv.org Artificial IntelligenceSep-23-2010

In this paper we present an optimal Bangla Keyboard Layout, which distributes the load equally on both hands so that maximizing the ease and minimizing the effort. Bangla alphabet has a large number of letters, for this it is difficult to type faster using Bangla keyboard. Our proposed keyboard will maximize the speed of operator as they can type with both hands parallel. Here we use the association rule of data mining to distribute the Bangla characters in the keyboard. First, we analyze the frequencies of data consisting of monograph, digraph and trigraph, which are derived from data wire-house, and then used association rule of data mining to distribute the Bangla characters in the layout. Finally, we propose a Bangla Keyboard Layout. Experimental results on several keyboard layout shows the effectiveness of the proposed approach with better performance.

artificial intelligence, expert system, keyboard layout, (10 more...)

arXiv.org Artificial Intelligence

1009.4586

Country: Asia > Bangladesh (0.16)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.97)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.87)

Add feedback

Ultrametric and Generalized Ultrametric in Computational Logic and in Data Analysis

Murtagh, Fionn

arXiv.org Machine LearningAug-20-2010

Following a review of metric, ultrametric and generalized ultrametric, we review their application in data analysis. We show how they allow us to explore both geometry and topology of information, starting with measured data. Some themes are then developed based on the use of metric, ultrametric and generalized ultrametric in logic. In particular we study approximation chains in an ultrametric or generalized ultrametric context. Our aim in this work is to extend the scope of data analysis by facilitating reasoning based on the data analysis; and to show how quantitative and qualitative data analysis can be incorporated into logic programming.

hierarchy, logic programming, survey article, (21 more...)

arXiv.org Machine Learning

1008.3585

Country: Europe > United Kingdom > England (0.14)

Genre:

Overview (0.66)
Research Report (0.50)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Logic & Formal Reasoning (0.85)

Add feedback

Mining tree-query associations in graphs

Hoekx, Eveline, Bussche, Jan Van den

arXiv.org Artificial IntelligenceAug-16-2010

New applications of data mining, such as in biology, bioinformatics, or sociology, are faced with large datasetsstructured as graphs. We introduce a novel class of tree-shapedpatterns called tree queries, and present algorithms for miningtree queries and tree-query associations in a large data graph. Novel about our class of patterns is that they can containconstants, and can contain existential nodes which are not counted when determining the number of occurrences of the patternin the data graph. Our algorithms have a number of provableoptimality properties, which are based on the theory of conjunctive database queries. We propose a practical, database-oriented implementation in SQL, and show that the approach works in practice through experiments on data about food webs, protein interactions, and citation analysis.

artificial intelligence, data mining, tree pattern, (21 more...)

arXiv.org Artificial Intelligence

1008.2626

Country: Europe (0.14)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.92)

Add feedback

Discovering shared and individual latent structure in multiple time series

Saria, Suchi, Koller, Daphne, Penn, Anna

arXiv.org Artificial IntelligenceAug-11-2010

This paper proposes a nonparametric Bayesian method for exploratory data analysis and feature construction in continuous time series. Our method focuses on understanding shared features in a set of time series that exhibit significant individual variability. Our method builds on the framework of latent Diricihlet allocation (LDA) and its extension to hierarchical Dirichlet processes, which allows us to characterize each series as switching between latent ``topics'', where each topic is characterized as a distribution over ``words'' that specify the series dynamics. However, unlike standard applications of LDA, we discover the words as we learn the model. We apply this model to the task of tracking the physiological signals of premature infants; our model obtains clinically significant insights as well as useful features for supervised learning tasks.

cardiology, time series, vascular disease, (20 more...)

arXiv.org Artificial Intelligence

1008.2028

Country: North America > United States > California > Santa Clara County (0.14)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.47)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.88)

Add feedback

An Agent based Approach towards Metadata Extraction, Modelling and Information Retrieval over the Web

Ahmed, Zeeshan, Gerhard, Detlef

arXiv.org Artificial IntelligenceAug-7-2010

Web development is a challenging research area for its creativity and complexity. The existing raised key challenge in web technology technologic development is the presentation of data in machine read and process able format to take advantage in knowledge based information extraction and maintenance. Currently it is not possible to search and extract optimized results using full text queries because there is no such mechanism exists which can fully extract the semantic from full text queries and then look for particular knowledge based information.

artificial intelligence, natural language, text processing, (11 more...)

arXiv.org Artificial Intelligence

1008.1333

Country: Europe > Austria > Vienna (0.21)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.47)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.43)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.41)
Information Technology > Data Science > Data Mining > Text Mining (0.38)

Add feedback

Constraint Programming for Data Mining and Machine Learning

Raedt, Luc De (K. U. Leuven) | Guns, Tias (K. U. Leuven) | Nijssen, Siegfried (K. U. Leuven)

AAAI ConferencesJul-15-2010

Machine learning and data mining have become aware that using constraints when learning patterns and rules can be very useful. To this end, a large number of special purpose systems and techniques have been developed for solving such constraint-based mining and learning problems. These techniques have, so far, been developed independently of the general purpose tools and principles of constraint programming known within the field of artificial intelligence. This paper shows that off-the-shelf constraint programming techniques can be applied to various pattern mining and rule learning problems (cf. also (De Raedt, Guns, and Nijssen 2008; Nijssen, Guns, and De Raedt 2009)). This does not only lead to methodologies that are more general and flexible, but also provides new insights into the underlying mining problems that allow us to improve the state-of-the-art in data mining. Such a combination of constraint programming and data mining raises a number of interesting new questions and challenges.

constraint, constraint-based reasoning, health & medicine, (14 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: Europe > Belgium > Flanders (0.14)

Genre: Research Report (0.46)

Industry:

Health & Medicine (0.46)
Education > Focused Education > Special Education (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Constraint-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Learning to Surface Deep Web Content

Wu, Zhaohui (Xi'an Jiaotong University) | Jiang, Lu (Xi'an Jiaotong University) | Zheng, Qinghua (Xi'an Jiaotong University) | Liu, Jun (Xi'an Jiaotong University)

AAAI ConferencesJul-15-2010

We propose a novel deep web crawling framework based on reinforcement learning. The crawler is regarded as an agent and deep web database as the environment. The agent perceives its current state and submits a selected action (query) to the environment according to Q-value. Based on the framework we develop an adaptive crawling method. Experimental results show that it outperforms the state of art methods in crawling capability and breaks through the assumption of full-text search implied by existing methods.

artificial intelligence, information management, q-value, (18 more...)

AAAI Conferences

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country: Asia > China (0.17)

Technology:

Information Technology > Communications > Web (1.00)
Information Technology > Information Management > Search (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.56)
Information Technology > Data Science > Data Mining > Web Mining (0.37)

Add feedback