AITopics | Country

Collaborating Authors

Country

Project Halo Update—Progress Toward Digital Aristotle

AI MagazineOct-10-2010

In the winter, 2004 issue of AI Magazine, we reported Vulcan Inc.'s first step toward creating a question-answering system called "Digital Aristotle." The goal of that first step was to assess the state of the art in applied Knowledge Representation and Reasoning (KRR) by asking AI experts to represent 70 pages from the advanced placement (AP) chemistry syllabus and to deliver knowledge-based systems capable of answering questions from that syllabus. This paper reports the next step toward realizing a Digital Aristotle: we present the design and evaluation results for a system called AURA, which enables domain experts in physics, chemistry, and biology to author a knowledge base and that then allows a different set of users to ask novel questions against that knowledge base. These results represent a substantial advance over what we reported in 2004, both in the breadth of covered subjects and in the provision of sophisticated technologies in knowledge representation and reasoning, natural language processing, and question answering to domain experts and novice users.

educational setting, knowledge, neural network, (24 more...)

AI Magazine

Country:

North America > United States > Texas (0.14)
North America > United States > Colorado (0.14)
North America > United States > California (0.14)
North America > United States > Massachusetts (0.14)

Genre:

Instructional Material > Course Syllabus & Notes (0.93)
Research Report > Experimental Study (0.68)

Industry:

Education > Educational Setting > Higher Education (0.93)
Education > Curriculum (0.69)

Technology:

Information Technology > Knowledge Management > Knowledge Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language > Question Answering (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Adapting Open Information Extraction to Domain-Specific Relations

AI MagazineOct-10-2010

Information extraction (IE) can identify a set of relations from free text to support question answering (QA). Until recently, IE systems were domain-specific and needed a combination of manual engineering and supervised learning to adapt to each target domain. A new paradigm, Open IE operates on large text corpora without any manual tagging of relations, and indeed without any pre-specified relations. Due to its open-domain and open-relation nature, Open IE is purely textual and is unable to relate the surface forms to an ontology, if known in advance. We explore the steps needed to adapt Open IE to a domain-specific ontology and demonstrate our approach of mapping domain-independent tuples to an ontology using domains from DARPA’s Machine Reading Project. Our system achieves precision over 0.90 from as few as 8 training examples for an NFL-scoring domain.

inductive learning, relation, us government, (21 more...)

AI Magazine

Country:

North America > United States > California (0.15)
North America > United States > New York (0.14)

Industry:

Leisure & Entertainment > Sports > Football (1.00)
Government > Military (0.69)
Government > Regional Government > North America Government > United States Government (0.55)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.76)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.70)

Add feedback

Infinite Hierarchical MMSB Model for Nested Communities/Groups in Social Networks

Ho, Qirong, Parikh, Ankur P., Song, Le, Xing, Eric P.

arXiv.org Machine LearningOct-9-2010

Actors in realistic social networks play not one but a number of diverse roles depending on whom they interact with, and a large number of such role-specific interactions collectively determine social communities and their organizations. Methods for analyzing social networks should capture these multi-faceted role-specific interactions, and, more interestingly, discover the latent organization or hierarchy of social communities. We propose a hierarchical Mixed Membership Stochastic Blockmodel to model the generation of hierarchies in social communities, selective membership of actors to subsets of these communities, and the resultant networks due to within- and cross-community interactions. Furthermore, to automatically discover these latent structures from social networks, we develop a Gibbs sampling algorithm for our model. We conduct extensive validation of our model using synthetic networks, and demonstrate the utility of our model in real-world datasets such as predator-prey networks and citation networks.

artificial intelligence, hierarchy, social media, (19 more...)

arXiv.org Machine Learning

1010.1868

Country: North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.15)

Genre: Research Report (0.40)

Industry: Information Technology > Services (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.93)

Add feedback

Feature selection in omics prediction problems using cat scores and false nondiscovery rate control

Ahdesmäki, Miika, Strimmer, Korbinian

arXiv.org Machine LearningOct-8-2010

We revisit the problem of feature selection in linear discriminant analysis (LDA), that is, when features are correlated. First, we introduce a pooled centroids formulation of the multiclass LDA predictor function, in which the relative weights of Mahalanobis-transformed predictors are given by correlation-adjusted $t$-scores (cat scores). Second, for feature selection we propose thresholding cat scores by controlling false nondiscovery rates (FNDR). Third, training of the classifier is based on James--Stein shrinkage estimates of correlations and variances, where regularization parameters are chosen analytically without resampling. Overall, this results in an effective and computationally inexpensive framework for high-dimensional prediction with natural feature selection. The proposed shrinkage discriminant procedures are implemented in the R package ``sda'' available from the R repository CRAN.

feature selection, health & medicine, oncology, (19 more...)

arXiv.org Machine Learning

doi: 10.1214/09-AOAS277

0903.2003

Country:

Europe > Germany (0.15)
Europe > Finland (0.14)

Genre: Research Report > Experimental Study (0.47)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (0.96)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Add feedback

Algorithmic and Statistical Perspectives on Large-Scale Data Analysis

Mahoney, Michael W.

arXiv.org Machine LearningOct-8-2010

In recent years, ideas from statistics and scientific computing have begun to interact in increasingly sophisticated and fruitful ways with ideas from computer science and the theory of algorithms to aid in the development of improved worst-case algorithms that are useful for large-scale scientific and Internet data analysis problems. In this chapter, I will describe two recent examples---one having to do with selecting good columns or features from a (DNA Single Nucleotide Polymorphism) data matrix, and the other having to do with selecting good clusters or communities from a data graph (representing a social or information network)---that drew on ideas from both areas and that may serve as a model for exploiting complementary algorithmic and statistical perspectives in order to solve applied large-scale data analysis problems.

algorithm, health & medicine, information management, (21 more...)

arXiv.org Machine Learning

1010.1609

Country: North America > United States > California > Santa Clara County (0.14)

Genre: Research Report (0.82)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Information Management > Search (0.93)
(4 more...)

Add feedback

Mixed-Membership Stochastic Block-Models for Transactional Networks

Shafiei, Mahdi, Chipman, Hugh

arXiv.org Artificial IntelligenceOct-7-2010

Transactional network data can be thought of as a list of one-to-many communications(e.g., email) between nodes in a social network. Most social network models convert this type of data into binary relations between pairs of nodes. We develop a latent mixed membership model capable of modeling richer forms of transactional network data, including relations between more than two nodes. The model can cluster nodes and predict transactions. The block-model nature of the model implies that groups can be characterized in very general ways. This flexible notion of group structure enables discovery of rich structure in transactional networks. Estimation and inference are accomplished via a variational EM algorithm. Simulations indicate that the learning algorithm can recover the correct generative model. Interesting structure is discovered in the Enron email dataset and another dataset extracted from the Reddit website. Analysis of the Reddit data is facilitated by a novel performance measure for comparing two soft clusterings. The new model is superior at discovering mixed membership in groups and in predicting transactions.

artificial intelligence, node, social media, (20 more...)

arXiv.org Artificial Intelligence

1010.1437

Country: North America > United States > California > San Francisco County > San Francisco (0.14)

Industry: Information Technology (0.96)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.94)

Add feedback

BART: Bayesian additive regression trees

Chipman, Hugh A., George, Edward I., McCulloch, Robert E.

arXiv.org Machine LearningOct-7-2010

We develop a Bayesian "sum-of-trees" model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BART's many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.

artificial intelligence, bart, health & medicine, (18 more...)

arXiv.org Machine Learning

doi: 10.1214/09-AOAS285

0806.3286

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Health & Medicine > Pharmaceuticals & Biotechnology (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback

A bagging SVM to learn from positive and unlabeled examples

Mordelet, Fantine, Vert, Jean-Philippe

arXiv.org Machine LearningOct-5-2010

We consider the problem of learning a binary classifier from a training set of positive and unlabeled examples, both in the inductive and in the transductive setting. This problem, often referred to as \emph{PU learning}, differs from the standard supervised classification problem by the lack of negative examples in the training set. It corresponds to an ubiquitous situation in many applications such as information retrieval or gene ranking, when we have identified a set of data of interest sharing a particular property, and we wish to automatically retrieve additional data sharing the same property among a large and easily available pool of unlabeled data. We propose a conceptually simple method, akin to bagging, to approach both inductive and transductive PU learning problems, by converting them into series of supervised binary classification problems discriminating the known positive examples from random subsamples of the unlabeled set. We empirically demonstrate the relevance of the method on simulated and real data, where it performs at least as well as existing methods while being faster.

classifier, health & medicine, inductive learning, (20 more...)

arXiv.org Machine Learning

1010.0772

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Tennessee (0.14)

Genre: Research Report > New Finding (0.47)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.90)
(2 more...)

Add feedback

Trek separation for Gaussian graphical models

Sullivant, Seth, Talaska, Kelli, Draisma, Jan

arXiv.org Machine LearningOct-4-2010

Gaussian graphical models are semi-algebraic subsets of the cone of positive definite covariance matrices. Submatrices with low rank correspond to generalizations of conditional independence constraints on collections of random variables. We give a precise graph-theoretic characterization of when submatrices of the covariance matrix have small rank for a general class of mixed graphs that includes directed acyclic and undirected graphs as special cases. Our new trek separation criterion generalizes the familiar $d$-separation criterion. Proofs are based on the trek rule, the resulting matrix factorizations and classical theorems of algebraic combinatorics on the expansions of determinants of path polynomials.

artificial intelligence, graph, machine learning, (18 more...)

arXiv.org Machine Learning

doi: 10.1214/09-AOS760

0812.1938

Country: North America > United States > Michigan > Washtenaw County > Ann Arbor (0.14)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.85)

Add feedback

Steepest Ascent Hill Climbing For A Mathematical Problem

Abraham, Siby, Kiss, Imre, Sanyal, Sugata, Sanglikar, Mukund

arXiv.org Artificial IntelligenceOct-2-2010

The paper proposes artificial intelligence technique called hill climbing to find numerical solutions of Diophantine Equations. Such equations are important as they have many applications in fields like public key cryptography, integer factorization, algebraic curves, projective curves and data dependency in super computers. Importantly, it has been proved that there is no general method to find solutions of such equations. This paper is an attempt to find numerical solutions of Diophantine equations using steepest ascent version of Hill Climbing. The method, which uses tree representation to depict possible solutions of Diophantine equations, adopts a novel methodology to generate successors. The heuristic function used help to make the process of finding solution as a minimization process. The work illustrates the effectiveness of the proposed methodology using a class of Diophantine equations given by a1. x1 p1 + a2. x2 p2 + ...... + an . xn pn = N where ai and N are integers. The experimental results validate that the procedure proposed is successful in finding solutions of Diophantine Equations with sufficiently large powers and large number of variables.

artificial intelligence, evolutionary algorithm, optimization problem, (17 more...)

arXiv.org Artificial Intelligence

1010.0298

Country:

Asia > India (0.49)
Asia > Middle East > UAE (0.14)

Industry: Information Technology > Security & Privacy (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.95)

Add feedback