AITopics

doi: 10.1088/1742-5468/2009/11/P11003

0907.4643

Country: Europe > Poland (0.14)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Immunology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.94)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.93)
Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Galluccio, Laurent, Michel, Olivier J. J., Comon, Pierre, Slezak, Eric, Hero, Alfred O.

Initialization Free Graph Based Clustering

arXiv.org Machine LearningSep-24-2009

This paper proposes an original approach to cluster multi-component data sets, including an estimation of the number of clusters. From the construction of a minimal spanning tree with Prim's algorithm, and the assumption that the vertices are approximately distributed according to a Poisson distribution, the number of clusters is estimated by thresholding the Prim's trajectory. The corresponding cluster centroids are then computed in order to initialize the generalized Lloyd's algorithm, also known as $K$-means, which allows to circumvent initialization problems. Some results are derived for evaluating the false positive rate of our cluster detection algorithm, with the help of approximations relevant in Euclidean spaces. Metrics used for measuring similarity between multi-dimensional data points are based on symmetrical divergences. The use of these informational divergences together with the proposed method leads to better results, compared to other clustering methods for the problem of astrophysical data processing. Some applications of this method in the multi/hyper-spectral imagery domain to a satellite view of Paris and to an image of the Mars planet are also presented. In order to demonstrate the usefulness of divergences in our problem, the method with informational divergence as similarity measure is compared with the same method using classical metrics. In the astrophysics application, we also compare the method with the spectral clustering algorithms.

algorithm, artificial intelligence, machine learning, (17 more...)

0909.4395

Country:

Europe > France (0.68)
North America > United States > California (0.46)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

AAAI ConferencesSep-1-2009

Ontology-Based Link Prediction in the LiveJournal Social Network

Caragea, Doina (Kansas State University) | Bahirwani, Vikas (Kansas State University) | Aljandal, Waleed (Kansas State University) | Hsu, William H. (Kansas State University)

LiveJournal is a social network journal service with focus on user interactions. As for many other online social networks, predicting potential friendships in the LiveJournal network is a problem of great practical interest. Previous work has shown that graph features extracted from the graph associated with the network are good predictors for friendship links. However, contrary to the intuition, user data (e.g., interests shared by two users) does not always improve the predictions obtained with graph features alone. This could be due to the fact that features constructed from a large number of user declared interests cannot capture the implicit semantic of the interests. To test this hypothesis, we use a clustering approach to build an interest ontology, and explore the ability of the ontology to improve the performance of learning algorithms at predicting friendship links, when interest-based features are used alone or in combination with graph-based features. The results show that ontology-based features can help improve the performance of several machine learning classiﬁers (in particular, random forest classiﬁers) at the task of predicting links in the LiveJournal social network.

algorithm, classifier, ontology, (14 more...)

Eighth Symposium on Abstraction, Reformulation, and Approximation

Country:

Oceania > New Zealand > North Island > Waikato (0.04)
North America > United States > Kansas > Riley County > Manhattan (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Wasserman, Larry, Roeder, Kathryn

High-dimensional variable selection

arXiv.org Machine LearningAug-20-2009

This paper explores the following question: what kind of statistical guarantees can be given when doing variable selection in high-dimensional models? In particular, we look at the error rates and power of some multi-stage regression methods. In the first stage we fit a set of candidate models. In the second stage we select one model by cross-validation. In the third stage we use hypothesis testing to eliminate some variables. We refer to the first two stages as "screening" and the last stage as "cleaning." We consider three screening methods: the lasso, marginal regression, and forward stepwise regression. Our method gives consistent variable selection under certain conditions.

artificial intelligence, assumption, machine learning, (17 more...)

doi: 10.1214/08-AOS646

0704.1139

Country: North America > United States (0.28)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)

Hoffmann, Brice, Zaslavskiy, Mikhail, Vert, Jean-Philippe, Stoven, Véronique

A new protein binding pocket similarity measure based on comparison of 3D atom clouds: application to ligand prediction

arXiv.org Machine LearningJul-9-2009

Motivation: Prediction of ligands for proteins of known 3D structure is important to understand structure-function relationship, predict molecular function, or design new drugs. Results: We explore a new approach for ligand prediction in which binding pockets are represented by atom clouds. Each target pocket is compared to an ensemble of pockets of known ligands. Pockets are aligned in 3D space with further use of convolution kernels between clouds of points. Performance of the new method for ligand prediction is compared to those of other available measures and to docking programs. We discuss two criteria to compare the quality of similarity measures: area under ROC curve (AUC) and classification based scores. We show that the latter is better suited to evaluate the methods with respect to ligand prediction. Our results on existing and new benchmarks indicate that the new method outperforms other approaches, including docking. Availability: The new method is available at http://cbio.ensmp.fr/paris/ Contact: mikhail.zaslavskiy@mines-paristech.fr

artificial intelligence, ligand, machine learning, (18 more...)

0907.1531

Genre: Research Report > New Finding (0.49)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.38)

Hanhijärvi, Sami, Puolamäki, Kai, Garriga, Gemma C.

Multiple Hypothesis Testing in Pattern Discovery

arXiv.org Machine LearningJun-29-2009

The problem of multiple hypothesis testing arises when there are more than one hypothesis to be tested simultaneously for statistical significance. This is a very common situation in many data mining applications. For instance, assessing simultaneously the significance of all frequent itemsets of a single dataset entails a host of hypothesis, one for each itemset. A multiple hypothesis testing method is needed to control the number of false positives (Type I error). Our contribution in this paper is to extend the multiple hypothesis framework to be used with a generic data mining algorithm. We provide a method that provably controls the family-wise error rate (FWER, the probability of at least one false positive) in the strong sense. We evaluate the performance of our solution on both real and generated data. The results show that our method controls the FWER while maintaining the power of the test.

data mining, machine learning, pattern recognition, (17 more...)

0906.5263

Genre: Research Report > Experimental Study (0.56)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (1.00)
(2 more...)

Seo, Young-Woo (Robotics Institute, Carnegie Mellon University) | Ratliff, Nathan (Robotics Institute, Carnegie Mellon University) | Urmson, Chris (Robotics Institute, Carnegie Mellon University)

Self-Supervised Aerial Images Analysis for Extracting Parking lot Structure

Road network information simplifies autonomous driving by providing strong priors about environments. It informs a robotic vehicle with where it can drive, models of what can be expected, and contextual cues that influence driving behaviors. Currently, however, road network information is manually generated using a combination of GPS survey and aerial imagery. These manual techniques are labor intensive and error prone. To full exploit the benefits of digital imagery, these processes should be automated. As a step toward this goal, we present an algorithm that extracts the structure of parking lot visible from a given aerial image. To minimize human intervention in the use of aerial imagery, we devise a self-supervised learning algorithm that automatically generates a set of parking spot templates to learn the appearance of a parking lot and estimates the structure of the parking lot from the learned model. The data set extracted from a single image alone is too small to sufficiently learn an accurate parking spot model. However, strong priors trained using large data sets collected across multiple images dramatically improvce performance. Our self-supervised approach outperforms the prior alone by adapting the distribution of examples toward that found in the current image. A thorough empirical analysis compares leading state-of-the-art learning techniques on this problem.

hypothesis, parking lot, parking spot, (16 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.73)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.69)
(2 more...)

Abnormal Activity Recognition based on HDP-HMM Models

Hu, Derek Hao (Hong Kong University of Science and Technology) | Zhang, Xian-Xing (Nanjing University) | Yin, Jie (CSIRO ICT Centre) | Zheng, Vincent Wenchen (Hong Kong University of Science and Technology) | Yang, Qiang (Hong Kong University of Science and Technology)

Detecting abnormal activities from sensor readings is an important research problem in activity recognition. A number of different algorithms have been proposed in the past to tackle this problem. Many of the previous state-based approaches suffer from the problem of failing to decide the appropriate number of states, which are difficult to find through a trial and-error approach, in real-world applications. In this paper, we propose an accurate and flexible framework for abnormal activity recognition from sensor readings that involves less human tuning of model parameters. Our approach first applies a Hierarchical Dirichlet Process Hidden Markov Model (HDP-HMM), which supports an infinite number of states, to automatically find an appropriate number of states. We incorporate a Fisher Kernel into the One-Class Support Vector Machine (OCSVM) model to filter out the activities that are likely to be normal. Finally, we derive an abnormal activity model from the normal activity models to reduce false positive rate in an unsupervised manner. Our main contribution is that our proposed HDP-HMM models can decide the appropriate number of states automatically, and that by incorporating a Fisher Kernel into the OCSVM model, we can combine the advantages from generative model and discriminative model. We demonstrate the effectiveness of our approach by using several real-world datasets to test our algorithm’s performance.

educe, execution, state-context pair, (14 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country: Asia > China > Guangdong Province > Guangzhou (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.53)

Durme, Benjamin Van (University of Rochester) | Lall, Ashwin (Georgia Institute of Technology)

Probabilistic Counting with Randomized Storage

Previous work by Talbot and Osborne [2007] explored the use of randomized storage mechanisms in language modeling. These structures trade a small amount of error for significant space savings, enabling the use of larger language models on relatively modest hardware. Going beyond space efficient count storage, here we present the Talbot Osborne Morris Bloom (TOMB) Counter, an extended model for performing space efficient counting over streams of finite length. Theoretical and experimental results are given, showing the promise of approximate counting over large vocabularies in the context of limited space.

bloom counter, bloom filter, frequency, (12 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Georgia > Fulton County > Atlanta (0.04)
North America > United States > California (0.04)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Sonntag, Daniel (German Research Center for AI (DFKI))

Introspection and Adaptable Model Integration for Dialogue-based Question Answering

Dialogue-based Question Answering (QA) is a highly complex task that brings together a QA system including various natural language processing components (i.e., components for question classification, information extraction, and retrieval) with dialogue systems for effective and natural communication. The dialogue-based access is difficult to establish when the QA system in use is complex and combines many different answer services with different quality and access characteristics. For example, some questions are processed by opendomain QA services with a broad coverage. Others should be processed by using a domain-specific instance ontology for more reliable answers. Different answer services may change their characteristics over time and the dialogue reaction models have to be updated according to that. To solve this problem, we developed introspective methods to integrate adaptable models of the answer services. We evaluated the impact of the learned models on the dialogue performance, i.e., whether the adaptable models can be used for a more convenient dialogue formulation process. We show significant effectiveness improvements in the resulting dialogues when using the machine learning (ML) models. Examples are provided in the context of the generation of system-initiative feedback to user questions and answers, as provided by heterogeneous information services.

dialogue, dialogue system, information, (14 more...)

Twenty-First International Joint Conference on Artificial Intelligence

Country:

South America > Chile (0.04)
South America > Brazil (0.04)
North America > United States > New York (0.04)
(6 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.96)