AITopics | Data Mining

Collaborating Authors

Data Mining

Computers have become adept at extracting patterns from very large collections of data. For example, shopping transactions can reveal consumers' preferences and message traffic on social networks can reveal political trends.

News Overviews Instructional Materials AI-Alerts Classics

Reports of the AAAI 2009 Spring Symposia

Bao, Jie (Rensselaer Polytechnic Institute) | Bojars, Uldis (National University of Ireland) | Choudhury, Ranzeem (Dartmouth College) | Ding, Li (Rensselaer Polytechnic Institute) | Greaves, Mark (Vulcan Inc.) | Kapoor, Ashish (Microsoft Research) | Louchart, Sandy (Heriot-Watt University) | Mehta, Manish (Georgia Institute of Technology) | Nebel, Bernhard (Albert-Ludwigs University Freiburg) | Nirenburg, Sergei (University of Maryland Baltimore County) | Oates, Tim (University of Maryland Baltimore County) | Roberts, David L. (Georgia Institute of Technology) | Sanfilippo, Antonio (Pacific Northwest National Laboratory) | Stojanovic, Nenad (University of Karlsruhe) | Stubbs, Kristen (iRobot Corportion) | Thomaz, Andrea L. (Georgia Institute of Technology) | Tsui, Katherine (University of Massachusetts Lowell) | Woelfl, Stefan (Albert-Ludwigs University Freiburg)

AI MagazineOct-20-2009

The Association for the Advancement of Artificial Intelligence, in cooperation with Stanford University's Department of Computer Science, was pleased to present the 2009 Spring Symposium Series, held Monday through Wednesday, March 23–25, 2009 at Stanford University. The titles of the nine symposia were Agents that Learn from Human Teachers, Benchmarking of Qualitative Spatial and Temporal Reasoning Systems, Experimental Design for Real-World Systems, Human Behavior Modeling, Intelligent Event Processing, Intelligent Narrative Technologies II, Learning by Reading and Learning to Read, Social Semantic Web: Where Web 2.0 Meets Web 3.0, and Technosocial Predictive Analytics. The goal of the Agents that Learn from Human Teachers was to investigate how we can enable software and robotics agents to learn from real-time interaction with an everyday human partner. The aim of the Benchmarking of Qualitative Spatial and Temporal Reasoning Systems symposium was to initiate the development of a problem repository in the field of qualitative spatial and temporal reasoning and identify a graded set of challenges for future midterm and long-term research. The Experimental Design symposium discussed the challenges of evaluating AI systems. The Human Behavior Modeling symposium explored reasoning methods for understanding various aspects of human behavior, especially in the context of designing intelligent systems that interact with humans. The Intelligent Event Processing symposium discussed the need for more AI-based approaches in event processing and defined a kind of research agenda for the field, coined as intelligent complex event processing (iCEP). The Intelligent Narrative Technologies II AAAI symposium discussed innovations, progress, and novel techniques in the research domain. The Learning by Reading and Learning to Read symposium explored two aspects of making natural language texts semantically accessible to, and processable by, machines. The Social Semantic Web symposium focused on the real-world grand challenges in this area. Finally, the Technosocial Predictive Analytics symposium explored new methods for anticipatory analytical thinking that provide decision advantage through the integration of human and physical models.

neural network, semantic web, symposium, (19 more...)

AI Magazine

Country: North America > United States > Maryland (0.29)

Genre: Research Report (1.00)

Industry:

Information Technology (1.00)
Education (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

An Immune Inspired Approach to Anomaly Detection

Twycross, Jamie, Aickelin, Uwe

arXiv.org Artificial IntelligenceOct-16-2009

The immune system provides a rich metaphor for computer security: anomaly detection that works in nature should work for machines. However, early artificial immune system approaches for computer security had only limited success. Arguably, this was due to these artificial systems being based on too simplistic a view of the immune system. We present here a second generation artificial immune system for process anomaly detection. It improves on earlier systems by having different artificial cell types that process information. Following detailed information about how to build such second generation systems, we find that communication between cells types is key to performance. Through realistic testing and validation we show that second generation artificial immune systems are capable of anomaly detection beyond generic system policies. The paper concludes with a discussion and outline of the next steps in this exciting area of computer security.

health & medicine, immunology, syscall, (20 more...)

arXiv.org Artificial Intelligence

0910.3117

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Add feedback

SparseCodePicking: feature extraction in mass spectrometry using sparse coding algorithms

Alexandrov, Theodore, Steinhorst, Klaus, Keszoecze, Oliver, Schiffler, Stefan

arXiv.org Machine LearningOct-5-2009

Mass spectrometry (MS) is an important technique for chemical profiling which calculates for a sample a high dimensional histogram-like spectrum. A crucial step of MS data processing is the peak picking which selects peaks containing information about molecules with high concentrations which are of interest in an MS investigation. We present a new procedure of the peak picking based on a sparse coding algorithm. Given a set of spectra of different classes, i.e. with different positions and heights of the peaks, this procedure can extract peaks by means of unsupervised learning. Instead of an $l_1$-regularization penalty term used in the original sparse coding algorithm we propose using an elastic-net penalty term for better regularization. The evaluation is done by means of simulation. We show that for a large region of parameters the proposed peak picking method based on the sparse coding features outperforms a mean spectrum-based method. Moreover, we demonstrate the procedure applying it to two real-life datasets.

health & medicine, oncology, procedure, (19 more...)

arXiv.org Machine Learning

0907.3426

Country: Europe > Germany > Bremen > Bremen (0.28)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area > Oncology (0.47)

Technology:

Information Technology > Data Science > Data Mining > Feature Extraction (0.42)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.32)

Add feedback

Statistical Decision Making for Authentication and Intrusion Detection

Dimitrakakis, Christos, Mitrokotsa, Aikaterini

arXiv.org Machine LearningOct-5-2009

User authentication and intrusion detection differ from standard classification problems in that while we have data generated from legitimate users, impostor or intrusion data is scarce or non-existent. We review existing techniques for dealing with this problem and propose a novel alternative based on a principled statistical decision-making view point. We examine the technique on a toy problem and validate it on complex real-world data from an RFID based access control system. The results indicate that it can significantly outperform the classical world model approach. The method could be more generally useful in other decision-making scenarios where there is a lack of adversary data.

adversary model, law enforcement, public safety, (19 more...)

arXiv.org Machine Learning

0910.0483

Country: North America > United States > California (0.14)

Genre: Research Report (0.82)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
Information Technology > Data Science > Data Mining (0.94)

Add feedback

Kronecker Graphs: An Approach to Modeling Networks

Leskovec, Jure, Chakrabarti, Deepayan, Kleinberg, Jon, Faloutsos, Christos, Ghahramani, Zoubin

arXiv.org Machine LearningAug-21-2009

How can we model networks with a mathematically tractable model that allows for rigorous analysis of network properties? Networks exhibit a long list of surprising properties: heavy tails for the degree distribution; small diameters; and densification and shrinking diameters over time. Most present network models either fail to match several of the above properties, are complicated to analyze mathematically, or both. In this paper we propose a generative model for networks that is both mathematically tractable and can generate networks that have the above mentioned properties. Our main idea is to use the Kronecker product to generate graphs that we refer to as "Kronecker graphs". First, we prove that Kronecker graphs naturally obey common network properties. We also provide empirical evidence showing that Kronecker graphs can effectively model the structure of real networks. We then present KronFit, a fast and scalable algorithm for fitting the Kronecker graph generation model to large real networks. A naive approach to fitting would take super- exponential time. In contrast, KronFit takes linear time, by exploiting the structure of Kronecker matrix multiplication and by using statistical simulation techniques. Experiments on large real and synthetic networks show that KronFit finds accurate parameters that indeed very well mimic the properties of target networks. Once fitted, the model parameters can be used to gain insights about the network structure, and the resulting synthetic graphs can be used for null- models, anonymization, extrapolations, and graph summarization.

artificial intelligence, data mining, graph, (19 more...)

arXiv.org Machine Learning

0812.4905

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.45)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(2 more...)

Add feedback

Node discovery problem for a social network

Maeno, Yoshiharu

arXiv.org Artificial IntelligenceAug-7-2009

Methods to solve a node discovery problem for a social network are presented. Covert nodes refer to the nodes which are not observable directly. They transmit the influence and affect the resulting collaborative activities among the persons in a social network, but do not appear in the surveillance logs which record the participants of the collaborative activities. Discovering the covert nodes is identifying the suspicious logs where the covert nodes would appear if the covert nodes became overt. The performance of the methods is demonstrated with a test dataset generated from computationally synthesized networks and a real organization.

law enforcement, node, public safety, (21 more...)

arXiv.org Artificial Intelligence

0710.4975

Country: North America > United States (0.46)

Industry:

Information Technology > Services (0.84)
Law Enforcement & Public Safety > Terrorism (0.69)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Communications > Social Media (0.94)
(2 more...)

Add feedback

A Data-Mining Approach to 3D Realistic Render Setup Assistance

Morcillo, Carlos Gonzalez (University of Castilla-La Mancha) | Lopez, Lorenzo Manuel Lopez (University of Castilla-La Mancha) | Sanchez, Jose Jesus Castro (University of Castilla-La Mancha) | Moser, Bernhard (Software Competence Center GmbH)

AAAI ConferencesJul-14-2009

Realistic rendering is the process of generating a 2D image from an abstract description of a 3D scene, aiming at achieving the quality of a photo. The quality of the generated image depends on the accuracy with which the employed render method simulates the behaviour of the light particles through the scene. According to the current practice, it is up to the user to choose optimal settings of input parameters for these methods in terms of time-efficiency, as well as image quality. This is an iterative trial and error process, even for expert users. This paper describes a novel approach based on techniques from the field of data mining and genetic computing to assist the user in the selection of render parameters. Experimental results are presented which show the benefits of this approach.

artificial intelligence, configuration, data mining, (19 more...)

AAAI Conferences

Twenty-First IAAI Conference

Country: Europe > Spain (0.14)

Genre: Overview > Innovation (0.34)

Technology:

Information Technology > Graphics (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining (0.70)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.30)

Add feedback

Explicit probabilistic models for databases and networks

De Bie, Tijl

arXiv.org Artificial IntelligenceJun-29-2009

Recent work in data mining and related areas has highlighted the importance of the statistical assessment of data mining results. Crucial to this endeavour is the choice of a non-trivial null model for the data, to which the found patterns can be contrasted. The most influential null models proposed so far are defined in terms of invariants of the null distribution. Such null models can be used by computation intensive randomization approaches in estimating the statistical significance of data mining results. Here, we introduce a methodology to construct non-trivial probabilistic models based on the maximum entropy (MaxEnt) principle. We show how MaxEnt models allow for the natural incorporation of prior information. Furthermore, they satisfy a number of desirable properties of previously introduced randomization approaches. Lastly, they also have the benefit that they can be represented explicitly. We argue that our approach can be used for a variety of data types. However, for concreteness, we have chosen to demonstrate it in particular for databases and networks.

constraint, health & medicine, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

0906.5148

Genre: Research Report > Experimental Study (0.48)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

Multiple Hypothesis Testing in Pattern Discovery

Hanhijärvi, Sami, Puolamäki, Kai, Garriga, Gemma C.

arXiv.org Machine LearningJun-29-2009

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test.

artificial intelligence, hypothesis, scientific discovery, (19 more...)

arXiv.org Machine Learning

0906.5263

Country:

North America > United States (0.14)
Europe > Finland (0.14)

Genre: Research Report > Experimental Study (0.56)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
(2 more...)

Add feedback

Node discovery in a networked organization

Maeno, Yoshiharu

arXiv.org Artificial IntelligenceJun-26-2009

In this paper, I present a method to solve a node discovery problem in a networked organization. Covert nodes refer to the nodes which are not observable directly. They affect social interactions, but do not appear in the surveillance logs which record the participants of the social interactions. Discovering the covert nodes is defined as identifying the suspicious logs where the covert nodes would appear if the covert nodes became overt. A mathematical model is developed for the maximal likelihood estimation of the network behind the social interactions and for the identification of the suspicious logs. Precision, recall, and F measure characteristics are demonstrated with the dataset generated from a real organization and the computationally synthesized datasets. The performance is close to the theoretical limit for any covert nodes in the networks of any topologies and sizes if the ratio of the number of observation to the number of possible communication patterns is large.

artificial intelligence, neural network, node, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ICSMC.2009.5346826

0803.3363

Country: North America > United States (0.28)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (0.75)
(2 more...)

Add feedback