AITopics | Statistical Learning

Collaborating Authors

Statistical Learning

News Overviews Instructional Materials AI-Alerts Classics

Complex Question Answering: Unsupervised Learning Approaches and Experiments

Journal of Artificial Intelligence ResearchMay-25-2009

Complex questions that require inferencing and synthesizing information from multiple documents can be seen as a kind of topic-oriented, informative multi-document summarization where the goal is to produce a single text as a compressed version of a set of documents with a minimum loss of relevant information. In this paper, we experiment with one empirical method and two unsupervised statistical machine learning techniques: K-means and Expectation Maximization (EM), for computing relative importance of the sentences. We compare the results of these approaches. Our experiments show that the empirical approach outperforms the other two techniques and EM performs better than K-means. However, the performance of these approaches depends entirely on the feature set used and the weighting of these features. In order to measure the importance and relevance to the user query we extract different kinds of features (i.e. lexical, lexical semantic, cosine similarity, basic element, tree kernel based syntactic and shallow-semantic) for each of the document sentences. We use a local search technique to learn the weights of the features. To the best of our knowledge, no study has used tree kernel functions to encode syntactic/semantic information for more complex tasks such as computing the relatedness between the query sentences and the document sentences in order to generate query-focused summaries (or answers to complex questions). For each of our methods of generating summaries (i.e. empirical, K-means and EM) we show the effects of syntactic and shallow-semantic features over the bag-of-words (BOW) features.

algorithm, information, similarity, (17 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2784

AI Access Foundation

10604

Journal of Artificial Intelligence Research

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.28)
Asia > Bangladesh (0.14)
Europe > Holy See (0.04)
(19 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.46)

Add feedback

Mining Default Rules from Statistical Data

Kern-Isberner, Gabriele (Technische Universität Dortmund) | Thimm, Matthias (Technische Universität Dortmund) | Finthammer, Marc (FernUniversität in Hagen) | Fisseler, Jens (FernUniversität in Hagen)

AAAI ConferencesMay-21-2009

In this paper, we are interested in the qualitative knowledge that underlies some given probabilistic information. To represent such qualitative structures, we use ordinal conditional functions, OCFs, (or ranking functions) as a qualitative abstraction of probability functions. The basic idea for transforming probabilities into ordinal rankings is to find well-behaved clusterings of the negative logarithms of the probabilities. We show how popular clustering tools can be used for this, and propose measures for the evaluation of the clustering results in this context. From the so obtained ranking functions, we extract conditionals that may serve as a base for inductive default reasoning.

conditional function, probability, representation, (15 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

Europe > Germany (0.04)
North America > United States > New York (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.47)

Add feedback

Organizing Knowledge as an Ontology of the Domain of Resilient Computing by Means of Natural Language Processing - An Experience Report -

AAAI ConferencesMay-21-2009

Scientists typically need to take a large volume of information into account in order to deal with re-occurring tasks such as inspecting proceedings, finding related work, or reviewing papers. Our work aims at filling the gap between text documents and a structured representations of their content in the domain of resilience computing by combining computer linguistics and ontological methods. The results of our research include: a thesaurus of the domain, automatic clustering of the domain documents, a domain ontology, and a tool for constructing ontologies with the aid of domain thesauri.

mapping, ontology, thesaurus, (13 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

Europe > Romania (0.04)
Europe > Lithuania > Kaunas County > Kaunas (0.04)
Europe > Germany > Saarland (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.49)

Add feedback

VipBoost: A More Accurate Boosting Algorithm

Su, Xiaoyuan (Florida Atlantic University) | Khoshgoftaar, Taghi M | Greiner, Russell

AAAI ConferencesMay-21-2009

Boosting is a well-known method for improving the accuracy of many learning algorithms. In this paper, we propose a novel boosting algorithm, VipBoost (voting on boosting classifications from imputed learning sets), which first generates multiple incomplete datasets from the original dataset by randomly removing a small percentage of observed attribute values, then uses an imputer to fill in the missing values. It then applies AdaBoost (using some base learner) to produce classifiers trained on each of the imputed learning sets, to produce multiple classifiers. The subsequent prediction on a new test case is the most frequent classification from these classifiers. Our empirical results show that VipBoost produces very effective classifiers that significantly improve accuracy for unstable base learners and some stable learners, especially when the initial dataset is incomplete.

classification accuracy, classifier, dataset, (13 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Florida > Palm Beach County > Boca Raton (0.05)
North America > United States > New York (0.04)
(2 more...)

Genre: Research Report > New Finding (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.31)

Add feedback

Multivariate Time Series Classification with Temporal Abstractions

Batal, Iyad (University of Pittsburgh) | Sacchi, Lucia (University of Pavia) | Bellazzi, Riccardo (University of Pavia) | Hauskrecht, Milos (University of Pittsburgh)

AAAI ConferencesMay-21-2009

The increase in the number of complex temporal datasets collected today has prompted the development of methods that extend classical machine learning and data mining methods to time-series data. This work focuses on methods for multivariate time-series classification. Time series classification is a challenging problem mostly because the number of temporal features that describe the data and are potentially useful for classification is enormous. We study and develop a temporal abstraction framework for generating multivariate time series features suitable for classification tasks. We propose the STF-Mine algorithm that automatically mines discriminative temporal abstraction patterns from the time series data and uses them to learn a classification model. Our experimental evaluations, carried out on both synthetic and real world medical data, demonstrate the benefit of our approach in learning accurate classifiers for time-series datasets.

abstraction, relation, temporal pattern, (14 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > New York (0.04)
Europe > Italy (0.04)

Genre: Research Report > Experimental Study (0.46)

Industry: Health & Medicine (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Pattern Recognition (0.94)
(2 more...)

Add feedback

Extracting Meaning from Cell Phone Improvement Ideas

Turner, Jenine (Athenahealth) | Lencevicius, Raimondas (Qwobl) | Adler, Mark (Nokia Research Center)

AAAI ConferencesMay-21-2009

Numerous companies nowadays gather product improvement There are two additional modifications we use to adjust ideas. Reviewing all of the resulting thousands of our feature set, that provide improvements over the original ideas without tools would require a great deal of time and feature counts. The first is based upon our assumption that resources. Automatic tools can help these reviewers in a words in the title are more important than words in the other number of ways. The questions we address here are categorization, text fields. We simply weight unigrams and bigrams that finding common ideas, and finding idea trends over appear in the title ten times as heavily as those that appear in time. We explore techniques to answer these questions using the rest of the text.

category, classification, probability, (12 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country: North America > United States > Massachusetts (0.05)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.74)

Add feedback

Hidden Markov Random Fields Based LSI Text Semi-supervised Clustering

Min, Kerui (Fudan University) | Liu, Gang (Fudan University) | Chen, Xin (Nanjing University) | Lu, Shengqi (Fudan University)

AAAI ConferencesMay-21-2009

Semi-supervised learning is an active research field. Previous results shown that unite background information into the original unsupervised clustering problem could archive higher accuracy. In this paper, we explore the cooperation between the pairwise constrains given by the user and the sematic information in natural language. In addition, we reduce the time complexity to make the algorithm feasible for large quantities of data. Experiments on different scales of corpus show the robustness and effectiveness of the proposed algorithm, which the F-measure archives 20% higher than previous algorithms.

algorithm, constraint, hidden markov random field, (11 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

Asia > China > Shanghai > Shanghai (0.06)
Asia > China > Jiangsu Province > Nanjing (0.05)

Genre: Research Report > New Finding (0.35)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.71)

Add feedback

Hierarchical Soft Clustering and Automatic Text Summarization for Accessing the Web on Mobile Devices for Visually Impaired People

Dias, Gaël Harry (University of Beira Interior) | Pais, Sebastião (University of Beira Interior) | Cunha, Fernando (University of Beira Interior) | Costa, Hugo (University of Beira Interior) | Machado, David (University of Beira Interior) | Barbosa, Tiago (University of Beira Interior) | Martins, Bruno (University of Beira Interior)

AAAI ConferencesMay-21-2009

In this paper, we propose a universal solution to web search and web browsing on handheld devices for visually impaired people. For this purpose, we propose (1) to automatically cluster web page results and (2) to summarize all the information in web pages so that speech-to-speech interaction is used efficiently to access information.

algorithm, information, snippet, (13 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

North America > United States > Florida > Monroe County > Key West (0.04)
Europe > United Kingdom > England > South Yorkshire > Sheffield (0.04)
Europe > Portugal (0.04)
(3 more...)

Industry: Health & Medicine (0.71)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science (0.95)
(2 more...)

Add feedback

A Large Margin Approach to Anaphora Resolution for Neuroscience Knowledge Discovery

Ozyurt, I. Burak (UCSD)

AAAI ConferencesMay-21-2009

A discriminative large margin classifier based approach to anaphora resolution for neuroscience abstracts is presented. The system employs both syntactic and semantic features. A support vector machine based word sense disambiguation method combining evidence from three methods, that use WordNet and Wikipedia, is also introduced and used for semantic features. The support vector machine anaphora resolution classifier with probabilistic outputs achieved almost four-fold improvement in accuracy over the baseline method.

anaphora resolution, classifier, resolution, (14 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

North America > United States > New York (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.71)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.68)

Add feedback

Document Clustering and Visualization with Latent Dirichlet Allocation and Self-Organizing Maps

Millar, Jeremy R. (Air Force Institute of Technology) | Peterson, Gilbert L. (Air Force Institute of Technology) | Mendenhall, Michael J. (Air Force Institute of Technology)

AAAI ConferencesMay-21-2009

Clustering and visualization of large text document collections aids in browsing, navigation, and information retrieval. We present a document clustering and visualization method based on Latent Dirichlet Allocation and self-organizing maps (LDA-SOM). LDA-SOM clusters documents based on topical content and renders clusters in an intuitive two-dimensional format. Document topics are inferred using a probabilistic topic model. Then, due to the topology preserving properties of self-organizing maps, document clusters with similar topic distributions are placed near one another in the visualization. This provides the user an intuitive means of browsing from one cluster to another based on topics held in common. The effectiveness of LDA-SOM is evaluated on the 20 Newsgroups and NIPS data sets.

document collection, topic distribution, vector, (15 more...)

AAAI Conferences

Twenty-Second International FLAIRS Conference

Country:

Asia > Middle East > Jordan (0.06)
North America > United States > New York (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
(3 more...)

Industry: Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.85)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.69)

Add feedback