AITopics

0906.5485

Country:

Europe > Spain (0.14)
Europe > Finland (0.14)

Industry:

Media > Film (1.00)
Leisure & Entertainment (1.00)

Technology:

Information Technology > Databases (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceJun-29-2009

Explicit probabilistic models for databases and networks

De Bie, Tijl

Recent work in data mining and related areas has highlighted the importance of the statistical assessment of data mining results. Crucial to this endeavour is the choice of a non-trivial null model for the data, to which the found patterns can be contrasted. The most influential null models proposed so far are defined in terms of invariants of the null distribution. Such null models can be used by computation intensive randomization approaches in estimating the statistical significance of data mining results. Here, we introduce a methodology to construct non-trivial probabilistic models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt models allow for the natural incorporation of prior information. Furthermore, they satisfy a number of desirable properties of previously introduced randomization approaches. Lastly, they also have the benefit that they can be represented explicitly. We argue that our approach can be used for a variety of data types. However, for concreteness, we have chosen to demonstrate it in particular for databases and networks.

constraint, health & medicine, optimization problem, (18 more...)

0906.5148

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Hanhijärvi, Sami, Puolamäki, Kai, Garriga, Gemma C.

Multiple Hypothesis Testing in Pattern Discovery

arXiv.org Machine LearningJun-29-2009

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test.

artificial intelligence, hypothesis, scientific discovery, (19 more...)

arXiv.org Machine Learning

0906.5263

Country:

North America > United States (0.14)
Europe > Finland (0.14)

Genre: Research Report > Experimental Study (0.56)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
(2 more...)

arXiv.org Artificial IntelligenceJun-26-2009

Node discovery in a networked organization

Maeno, Yoshiharu

In this paper, I present a method to solve a node discovery problem in a networked organization. Covert nodes refer to the nodes which are not observable directly. They affect social interactions, but do not appear in the surveillance logs which record the participants of the social interactions. Discovering the covert nodes is defined as identifying the suspicious logs where the covert nodes would appear if the covert nodes became overt. A mathematical model is developed for the maximal likelihood estimation of the network behind the social interactions and for the identification of the suspicious logs. Precision, recall, and F measure characteristics are demonstrated with the dataset generated from a real organization and the computationally synthesized datasets. The performance is close to the theoretical limit for any covert nodes in the networks of any topologies and sizes if the ratio of the number of observation to the number of possible communication patterns is large.

artificial intelligence, neural network, node, (16 more...)

doi: 10.1109/ICSMC.2009.5346826

0803.3363

Country: North America > United States (0.28)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.75)
(2 more...)

AAAI ConferencesJun-23-2009

A Content-Based Method to Enhance Tag Recommendation

Lu, Yu-Ta (National Taiwan University) | Yu, Shoou-I (National Taiwan University) | Chang, Tsung-Chieh (National Taiwan University) | Hsu, Jane Yung-jen (National Taiwan University)

Tagging has become a primary tool for users to organize and share digital content on many social media sites. In addition, tag information has been shown to enhance capabilities of existing search engines. However, many resources on the web still lack tag information. This paper proposes a content-based approach to tag recommendation which can be applied to webpages with or without prior tag information. While social bookmarking service such as Delicious enables users to share annotated bookmarks, tag recommendation is available only for pages with tags specified by other users. Our proposed approach is motivated by the observation that similar webpages tend to have the same tags. Each webpage can therefore share the tags they own with similar webpages. The propagation of a tag depends on its weight in the originating webpage and the similarity between the sending and receiving webpages. The similarity metric between two webpages is defined as a linear combination of four cosine similarities, taking into account both tag information and page content. Experiments using data crawled from Delicious show that the proposed method is effective in populating untagged webpages with the correct tags.

information management, social media, url, (23 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States (0.14)
Europe > United Kingdom > Scotland (0.14)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
(5 more...)

arXiv.org Artificial IntelligenceJun-4-2009

Mining Compressed Repetitive Gapped Sequential Patterns Efficiently

Tong, Yongxin, Zhao, Li, Yu, Dan, Ma, Shilong, Xu, Ke

Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient mining sequential patterns algorithms have been proposed and studied. Recently, in many problem domains (e.g, program execution traces), a novel sequential pattern mining research, called mining repetitive gapped sequential patterns, has attracted the attention of many researchers, considering not only the repetition of sequential pattern in different sequences but also the repetition within a sequence is more meaningful than the general sequential pattern mining which only captures occurrences in different sequences. However, the number of repetitive gapped sequential patterns generated by even these closed mining algorithms may be too large to understand for users, especially when support threshold is low. In this paper, we propose and study the problem of compressing repetitive gapped sequential patterns. Inspired by the ideas of summarizing frequent itemsets, RPglobal, we develop an algorithm, CRGSgrow (Compressing Repetitive Gapped Sequential pattern grow), including an efficient pruning strategy, SyncScan, and an efficient representative pattern checking scheme, -dominate sequential pattern checking. The CRGSgrow is a two-step approach: in the first step, we obtain all closed repetitive sequential patterns as the candidate set of representative repetitive sequential patterns, and at the same time get the most of representative repetitive sequential patterns; in the second step, we only spend a little time in finding the remaining the representative patterns from the candidate set. An empirical study with both real and synthetic data sets clearly shows that the CRGSgrow has good performance.

artificial intelligence, data mining, sequential pattern, (18 more...)

0906.0885

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science > Data Mining (0.77)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.57)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.46)

arXiv.org Machine LearningJun-1-2009

Symmetry in Data Mining and Analysis: A Unifying View based on Hierarchy

Murtagh, Fionn

Data analysis and data mining are concerned with unsupervised pattern finding and structure determination in data sets. The data sets themselves are explicitly linked as a form of representation to an observational or otherwise empirical domain of interest. "Structure" has long been understood as symmetry which can take many forms with respect to any transformation, including point, translational, rotational, and many others. Beginning with the role of number theory in expressing data, we show how we can naturally proceed to hierarchical structures. We show how this both encapsulates traditional paradigms in data analysis, and also opens up new perspectives towards issues that are on the order of the day, including data mining of massive, high dimensional, heterogeneous data sets. Linkages with other fields are also discussed including computational logic and symbolic dynamics. The structures in data surveyed here are based on hierarchy, represented as p-adic numbers or an ultrametric topology.

dendrogram, health & medicine, survey article, (21 more...)

arXiv.org Machine Learning

doi: 10.1134/S0081543809020175

0805.2744

Country:

Europe > United Kingdom > England (0.14)
North America > United States > New York (0.14)

Genre: Overview (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

AAAI ConferencesMay-21-2009

Multivariate Time Series Classification with Temporal Abstractions

Batal, Iyad (University of Pittsburgh) | Sacchi, Lucia (University of Pavia) | Bellazzi, Riccardo (University of Pavia) | Hauskrecht, Milos (University of Pittsburgh)

The increase in the number of complex temporal datasets collected today has prompted the development of methods that extend classical machine learning and data mining methods to time-series data. This work focuses on methods for multivariate time-series classification. Time series classification is a challenging problem mostly because the number of temporal features that describe the data and are potentially useful for classification is enormous. We study and develop a temporal abstraction framework for generating multivariate time series features suitable for classification tasks. We propose the STF-Mine algorithm that automatically mines discriminative temporal abstraction patterns from the time series data and uses them to learn a classification model. Our experimental evaluations, carried out on both synthetic and real world medical data, demonstrate the benefit of our approach in learning accurate classifiers for time-series datasets.

abstraction, artificial intelligence, health & medicine, (19 more...)

Twenty-Second International FLAIRS Conference

Country: North America > United States (0.14)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.94)
(2 more...)

Eberle, William (Tennessee Technological University) | Bisant, David (The Laboratory for Physical Sciences)

Special Track on Data Mining

AAAI ConferencesMay-21-2009

Data mining is a field of research dedicated to the process of extracting underlying patterns in data collections. The FLAIRS special track on data mining has the goal of presenting new and important contributions to this field. Areas of interest include, but are not limited to, applications such as intelligence analysis, medical and health applications, text, video, and multimedia mining, e-commerce and web data, financial data analysis, intrusion detection, remote sensing, earth sciences, and astronomy; modeling algorithms such as hidden Markov, decision trees, neural networks, statistical methods, or probabilistic methods; case studies in areas of application, or over different algorithms and approaches; feature extraction and selection; post-processing techniques such as visualization, summarization, or trending; preprocessing and data reduction; data engineering or warehousing; or other data mining research that is related to artificial intelligence.

artificial intelligence, data mining, special track, (2 more...)

Twenty-Second International FLAIRS Conference

Industry: Information Technology (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.31)

AAAI ConferencesMay-21-2009

A Data Warehouse-Based Approach for Quality Management, Analysis and Evaluation of Intelligent Systems using Subgroup Mining

Atzmueller, Martin (University of Wuerzburg) | Puppe, Frank (University of Wuerzburg) | Beer, Stephanie (University-Hospital of Wuerzburg)

Quality management, analysis and evaluation of intelligent systems are important tasks. This paper proposes a data mining approach based on the technique of subgroup mining utilizing a data warehouse that contains data from the respective intelligent system to be evaluated and from other external sources. The context of our work is given by an intelligent documentation and consultation system in the medical domain of sonography. For demonstrating the applicability and benefit of the presented approach, we provide several realworld examples of a case-study applying the approach in the medical domain of sonography.

diagnosis, expert system, information management, (19 more...)

Twenty-Second International FLAIRS Conference

Country:

Europe (0.29)
North America > United States > New York (0.14)

Industry:

Health & Medicine > Therapeutic Area (1.00)
Health & Medicine > Diagnostic Medicine (0.68)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)