AITopics

0810.3865

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Africa > South Africa > Gauteng > Johannesburg (0.04)

Genre: Research Report > New Finding (0.47)

Industry:

Health & Medicine > Therapeutic Area > Immunology (0.50)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.32)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.92)

Wong, Wilson, Liu, Wei, Bennamoun, Mohammed

Determining the Unithood of Word Sequences using Mutual Information and Independence Measure

arXiv.org Artificial IntelligenceOct-1-2008

Most works related to unithood were conducted as part of a larger effort for the determination of termhood. Consequently, the number of independent research that study the notion of unithood and produce dedicated techniques for measuring unithood is extremely small. We propose a new approach, independent of any influences of termhood, that provides dedicated measures to gather linguistic evidence from parsed text and statistical evidence from Google search engine for the measurement of unithood. Our evaluations revealed a precision and recall of 98.68% and 91.82% respectively with an accuracy at 95.42% in measuring the unithood of 1005 test cases.

artificial intelligence, machine learning, natural language, (18 more...)

0810.0156

Country:

North America > United States (0.14)
Oceania > Australia > Western Australia (0.04)
Europe > Spain > Galicia > Madrid (0.04)
(7 more...)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.91)
Information Technology > Information Management > Search (0.88)

arXiv.org Machine LearningSep-19-2008

Finding rare objects and building pure samples: Probabilistic quasar classification from low resolution Gaia spectra

Bailer-Jones, C. A. L., Smith, K. W., Tiede, C., Sordo, R., Vallenari, A.

We develop and demonstrate a probabilistic method for classifying rare objects in surveys with the particular goal of building very pure samples. It works by modifying the output probabilities from a classifier so as to accommodate our expectation (priors) concerning the relative frequencies of different classes of objects. We demonstrate our method using the Discrete Source Classifier, a supervised classifier currently based on Support Vector Machines, which we are developing in preparation for the Gaia data analysis. DSC classifies objects using their very low resolution optical spectra. We look in detail at the problem of quasar classification, because identification of a pure quasar sample is necessary to define the Gaia astrometric reference frame. By varying a posterior probability threshold in DSC we can trade off sample completeness and contamination. We show, using our simulated data, that it is possible to achieve a pure sample of quasars (upper limit on contamination of 1 in 40,000) with a completeness of 65% at magnitudes of G=18.5, and 50% at G=20.0, even when quasars have a frequency of only 1 in every 2000 objects. The star sample completeness is simultaneously 99% with a contamination of 0.7%. Including parallax and proper motion in the classifier barely changes the results. We further show that not accounting for class priors in the target population leads to serious misclassifications and poor predictions for sample completeness and contamination. (Truncated)

artificial intelligence, machine learning, quasar, (17 more...)

arXiv.org Machine Learning

doi: 10.1111/j.1365-2966.2008.13983.x

0809.3373

Country: Europe (0.46)

Genre: Research Report (0.64)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Esmeir, S., Markovitch, S.

Anytime Induction of Low-cost, Low-error Classifiers: a Sampling-based Approach

Journal of Artificial Intelligence ResearchSep-17-2008

Machine learning techniques are gaining prevalence in the production of a wide range of classifiers for complex real-world applications with nonuniform testing and misclassification costs. The increasing complexity of these applications poses a real challenge to resource management during learning and classification. In this work we introduce ACT (anytime cost-sensitive tree learner), a novel framework for operating in such complex environments. ACT is an anytime algorithm that allows learning time to be increased in return for lower classification costs. It builds a tree top-down and exploits additional time resources to obtain better estimations for the utility of the different candidate splits. Using sampling techniques, ACT approximates the cost of the subtree under each candidate split and favors the one with a minimal cost. As a stochastic algorithm, ACT is expected to be able to escape local minima, into which greedy methods may be trapped. Experiments with a variety of datasets were conducted to compare ACT to the state-of-the-art cost-sensitive tree learners. The results show that for the majority of domains ACT produces significantly less costly trees. ACT also exhibits good anytime behavior with diminishing returns.

algorithm, dataset, misclassification cost, (13 more...)

doi: 10.1613/jair.2602

10567

Country:

North America > United States > Washington > King County > Seattle (0.14)
South America > Paraguay > Asunción > Asunción (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
(19 more...)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Journal of Artificial Intelligence ResearchJul-1-2008

SATzilla: Portfolio-based Algorithm Selection for SAT

Xu, L., Hutter, F., Hoos, H. H., Leyton-Brown, K.

It has been widely observed that there is no single "dominant" SAT solver; instead, different solvers perform best on different instances. Rather than following the traditional approach of choosing the best solver for a given class of instances, we advocate making this decision online on a per-instance basis. Building on previous work, we describe SATzilla, an automated approach for constructing per-instance algorithm portfolios for SAT that use so-called empirical hardness models to choose among their constituent solvers. This approach takes as input a distribution of problem instances and a set of component solvers, and constructs a portfolio optimizing a given objective function (such as mean runtime, percent of instances solved, or score in a competition). The excellent performance of SATzilla was independently verified in the 2007 SAT Competition, where our SATzilla07 solvers won three gold, one silver and one bronze medal. In this article, we go well beyond SATzilla07 by making the portfolio construction scalable and completely automated, and improving it by integrating local search solvers as candidate solvers, by predicting performance score instead of runtime, and by using hierarchical hardness models that take into account different types of SAT instances. We demonstrate the effectiveness of these new techniques in extensive experimental results on data sets including instances from the most recent SAT competition.

algorithm, satzilla07, solver, (12 more...)

doi: 10.1613/jair.2490

10556

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > Canada > British Columbia (0.04)

Genre: Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Sports (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Gansell, Amy Rebecca, Tamaru, Irene K., Jakulin, Aleks, Wiggins, Chris H.

Predicting Regional Classification of Levantine Ivory Sculptures: A Machine Learning Approach

arXiv.org Machine LearningJun-27-2008

Art historians and archaeologists have long grappled with the regional classification of ancient Near Eastern ivory carvings. Based on the visual similarity of sculptures, individuals within these fields have proposed object assemblages linked to hypothesized regional production centers. Using quantitative rather than visual methods, we here approach this classification task by exploiting computational methods from machine learning currently used with success in a variety of statistical problems in science and engineering. We first construct a prediction function using 66 categorical features as inputs and regional style as output. The model assigns regional style group (RSG), with 98 percent prediction accuracy. We then rank these features by their mutual information with RSG, quantifying single-feature predictive power. Using the highest- ranking features in combination with nomographic visualization, we have found previously unknown relationships that may aid in the regional classification of these ivories and their interpretation in art historical context.

artificial intelligence, machine learning, north syrian, (15 more...)

arXiv.org Machine Learning

0806.4642

Country:

North America > United States > New York > New York County > New York City (0.15)
North America > United States > Illinois > Cook County > Chicago (0.05)
Asia > Middle East > Iraq > Baghdad Governorate > Baghdad (0.05)
(12 more...)

Genre:

Research Report > Experimental Study (0.46)
Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.47)

Michelson, M., Knoblock, C. A.

Creating Relational Data from Unstructured and Ungrammatical Data Sources

Journal of Artificial Intelligence ResearchMar-28-2008

In order for agents to act on behalf of users, they will have to retrieve and integrate vast amounts of textual data on the World Wide Web. However, much of the useful data on the Web is neither grammatical nor formally structured, making querying difficult. Examples of these types of data sources are online classifieds like Craigslist and auction item listings like eBay. We call this unstructured, ungrammatical data "posts." The unstructured nature of posts makes query and integration difficult because the attributes are embedded within the text. Also, these attributes do not conform to standardized values, which prevents queries based on a common attribute value. The schema is unknown and the values may vary dramatically making accurate search difficult. Creating relational data for easy querying requires that we define a schema for the embedded attributes and extract values from the posts while standardizing these values. Traditional information extraction (IE) is inadequate to perform this task because it relies on clues from the data, such as structure or natural language, neither of which are found in posts. Furthermore, traditional information extraction does not incorporate data cleaning, which is necessary to accurately query and integrate the source. The two-step approach described in this paper creates relational data sets from unstructured and ungrammatical text by addressing both issues. To do this, we require a set of known entities called a "reference set." The first step aligns each post to each member of each reference set. This allows our algorithm to define a schema over the post and include standard values for the attributes defined by this schema. The second step performs information extraction for the attributes, including attributes not easily represented by reference sets, such as a price. In this manner we create a relational structure over previously unstructured data, supporting deep and accurate queries over the data as well as standard values for integration. Our experimental results show that our technique matches the posts to the reference set accurately and efficiently and outperforms state-of-the-art extraction systems on the extraction task from posts.

algorithm, extraction, phoebus, (15 more...)

doi: 10.1613/jair.2409

10541

Country:

North America > United States > New York (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Government > Regional Government > North America Government > United States Government (1.00)
Education > Educational Setting > Online (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Databases (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
(5 more...)

arXiv.org Artificial IntelligenceMar-25-2008

Reinforcement Learning by Value Gradients

Fairbank, Michael

The concept of the value-gradient is introduced and developed in the context of reinforcement learning, for deterministic episodic control problems that use a function approximator and have a continuous state space. It is shown that by learning the valuegradients, instead of just the values themselves, exploration or stochastic behaviour is no longer needed to find locally optimal trajectories. This is the main motivation for using value-gradients, and it is argued that learning the value-gradients is the actual objective of any value-function learning algorithm for control problems. It is also argued that learning value-gradients is significantly more efficient than learning just the values, and this argument is supported in experiments by efficiency gains of several orders of magnitude, in several problem domains. Once value-gradients are introduced into learning, several analyses become possible. For example, a surprising equivalence between a value-gradient learning algorithm and a policy-gradient learning algorithm is proven, and this provides a robust convergence proof for control problems using a value function with a general function approximator. Also, the issue of whether to include'residual gradient' terms into the weight update equations is addressed. Finally, an analysis is made of actor-critic architectures, which finds strong similarities to back-propagation through time, and gives simplifications and convergence proofs to certain actor-critic architectures, but while making those actor-critic architectures redundant. Unfortunately, by proving equivalence to policy-gradient learning, finding new divergence examples even in the absence of bootstrapping, and proving the redundancy of residual-gradients and actor-critic architectures in some circumstances, this paper does somewhat discredit the usefulness of using a value-function.

machine learning, reinforcement learning, trajectory, (17 more...)

0803.3539

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(5 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)

arXiv.org Artificial IntelligenceMar-11-2008

Dempster-Shafer for Anomaly Detection

Chen, Qi, Aickelin, Uwe

Intrusion Detection Systems (IDSs) play a pivotal role within network security [1]. IDSs are one of many tools used to detect attacks and intruders of computer systems. It is important to note that the purpose of IDSs is not to prevent the entry of intruders to a system, but to notify the administrator of any observed intruders. IDS techniques can be categorised as either misuse detectors or anomaly detectors. Misuse detection systems, such as Snort [2], rely on intrusion signatures to detect an attack.

artificial intelligence, data mining, machine learning, (20 more...)

0803.1568

Country:

North America > United States > Wisconsin (0.04)
Europe > United Kingdom > England > Nottinghamshire > Nottingham (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
Europe > Norway (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Anomaly Detection (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.69)

Journal of Artificial Intelligence ResearchFeb-29-2008

Gesture Salience as a Hidden Variable for Coreference Resolution and Keyframe Extraction

Eisenstein, J., Barzilay, R., Davis, R.

Gesture is a non-verbal modality that can contribute crucial information to the understanding of natural language. But not all gestures are informative, and non-communicative hand motions may confuse natural language processing (NLP) and impede learning. People have little difficulty ignoring irrelevant hand movements and focusing on meaningful gestures, suggesting that an automatic system could also be trained to perform this task. However, the informativeness of a gesture is context-dependent and labeling enough data to cover all cases would be expensive. We present conditional modality fusion, a conditional hidden-variable model that learns to predict which gestures are salient for coreference resolution, the task of determining whether two noun phrases refer to the same semantic entity. Moreover, our approach uses only coreference annotations, and not annotations of gesture salience itself. We show that gesture features improve performance on coreference resolution, and that by attending only to gestures that are salient, our method achieves further significant gains. In addition, we show that the model of gesture salience learned in the context of coreference accords with human intuition, by demonstrating that gestures judged to be salient by our model can be used successfully to create multimedia keyframe summaries of video. These summaries are similar to those created by human raters, and significantly outperform summaries produced by baselines from the literature.

coreference resolution, noun phrase, proceedings, (12 more...)

doi: 10.1613/jair.2450

10536

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Asia > Middle East > Jordan (0.04)
(4 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Education (0.45)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
(5 more...)