AITopics

Twenty-Fourth AAAI Conference on Artificial Intelligence

Country:

Asia > Middle East > Iraq (0.15)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Pennsylvania (0.04)
(2 more...)

Industry: Government > Regional Government (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.72)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.71)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (0.46)

Michelson, M., Knoblock, C. A.

Constructing Reference Sets from Unstructured, Ungrammatical Text

Journal of Artificial Intelligence ResearchMay-28-2010

Vast amounts of text on the Web are unstructured and ungrammatical, such as classified ads, auction listings, forum postings, etc. We call such text posts. Despite their inconsistent structure and lack of grammar, posts are full of useful information. This paper presents work on semi-automatically building tables of relational information, called reference sets, by analyzing such posts directly. Reference sets can be applied to a number of tasks such as ontology maintenance and information extraction. Our reference-set construction method starts with just a small amount of background knowledge, and constructs tuples representing the entities in the posts to form a reference set. We also describe an extension to this approach for the special case where even this small amount of background knowledge is impossible to discover and use. To evaluate the utility of the machine-constructed reference sets, we compare them to manually constructed reference sets in the context of reference-set-based information extraction. Our results show the reference sets constructed by our method outperform manually constructed reference sets. We also compare the reference-set-based extraction approach using the machine-constructed reference set to supervised extraction approaches using generic features. These results demonstrate that using machine-constructed reference sets outperforms the supervised methods, even though the supervised methods require training data.

entity tree, extraction, seed-based method, (14 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.2937

AI Access Foundation

10652

Journal of Artificial Intelligence Research

Country:

North America > United States > California > San Francisco County > San Francisco (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > Los Angeles County > El Segundo (0.04)

Genre: Research Report > New Finding (1.00)

Industry:

Automobiles & Trucks > Manufacturer (1.00)
Transportation > Passenger (0.93)
Transportation > Ground > Road (0.93)
Government > Regional Government > North America Government > United States Government (0.67)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(3 more...)

From Tweets to Polls: Linking Text Sentiment to Public Opinion Time Series

O' (Carnegie Mellon University) | Connor, Brendan (Carnegie Mellon University) | Balasubramanyan, Ramnath (Carnegie Mellon University) | Routledge, Bryan R. (Carnegie Mellon University) | Smith, Noah A.

We connect measures of public opinion measured from polls with sentiment measured from text. We analyze several surveys on consumer conﬁdence and political opinion over the 2008 to 2009 period, and ﬁnd they correlate to sentiment word frequencies in contempora- neous Twitter messages. While our results vary across datasets, in several cases the correlations are as high as 80%, and capture important large-scale trends. The re- sults highlight the potential of text streams as a substi- tute and supplement for traditional polling. consumer conﬁdence and political opinion, and can also pre- dict future movements in the polls. We ﬁnd that temporal smoothing is a critically important issue to support a suc- cessful model.

consumer confidence, sentiment, sentiment ratio, (14 more...)

Fourth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
North America > United States > Michigan (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre:

Questionnaire & Opinion Survey (1.00)
Research Report > New Finding (0.48)

Industry:

Information Technology > Services (0.68)
Government > Voting & Elections (0.47)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)

Raaijmakers, Stephan (TNO ICT, Delft, The Netherlands) | Kraaij, Wessel (TNO ICT, Delft, The Netherlands)

Classifier Calibration for Multi-Domain Sentiment Classification

Textual sentiment classifiers classify texts into a fixed number of affective classes, such as positive, negative or neutral sentiment, or subjective versus objective information. It has been observed that sentiment classifiers suffer from a lack of generalization capability: a classifier trained on a certain domain generally performs worse on data from another domain. This phenomenon has been attributed to domain-specific affective vocabulary. In this paper, we propose a voting-based thresholding approach, which calibrates a number of existing single-domain classifiers with respect to sentiment data from a new domain. The approach presupposes only a small amount of annotated data from the new domain. We evaluate three criteria for estimating thresholds, and discuss the ramifications of these criteria for the trade-off between classifier performance and manual annotation effort.

artificial intelligence, classifier, natural language, (16 more...)

Fourth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > Ohio > Franklin County > Columbus (0.04)
North America > United States > New York > New York County > New York City (0.04)
Europe > Netherlands > South Holland > Delft (0.04)

Genre: Research Report (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Generating Domain-Specific Clues Using News Corpus for Sentiment Classification

Kim, Youngho (University of Massachusetts Amherst) | Choi, Yoonjung (KAIST) | Myaeng, Sung-Hyon (KAIST)

This paper addresses the problem of automatically generating domain-specific sentiment clues. The main idea is to bootstrap from a small seed set and generate new clues by using dependencies and collocation information between sentiment clues and sentence-level topics that would be a primary subject of sentiment expression (e.g., event, company, and person). The experiments show that the aggregated clues are effective for sentiment classification.

artificial intelligence, natural language, text classification, (3 more...)

Fourth International AAAI Conference on Weblogs and Social Media

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.89)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.89)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.60)

The Wisdom of Bookies? Sentiment Analysis Versus. the NFL Point Spread

Hong, Yancheng (Hong Kong University of Science &) | Skiena, Steven (Technology)

The American Football betting market provides a particularly attractive domain to study the nexus between public sentiment and the wisdom of crowds. In this paper, we present the first substantial study of the relationship between the NFL betting line and public opinion expressed in blogs and microblogs (Twitter). We perform a large-scale study of four distinct text streams: LiveJournal blogs, RSS blog feeds captured by Spinn3r, Twitter, and traditional news media. Our results show interesting disparities between the first and second halves of each season. We present evidence showing usefulness of sentiment on NFL betting. We demonstrate that a strategy betting roughly 30 games per year identified winner roughly 60% of the time from 2006 to 2009, well beyond what is needed to overcome the bookie's typical commission(53%).

artificial intelligence, natural language, social media, (17 more...)

Fourth International AAAI Conference on Weblogs and Social Media

Country:

North America > United States > New York > Suffolk County > Stony Brook (0.05)
North America > United States > Nevada (0.05)
Asia > China > Jiangsu Province > Yancheng (0.05)

Genre: Research Report > New Finding (0.68)

Industry: Leisure & Entertainment > Sports > Football (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.66)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.52)

Socio-Legal Analysis of Criminal Sentences: A Preliminary Study

Giura, Giuseppe (University of Catani) | Giuffrida, Giovanni (University of Catani) | Pennisi, Carlo (University of Catani) | Zarba, Calogero (Neodata Intelligence)

This paper discusses a research based on analyzing criminal sentences on criminal trials on organized crime activity in Sicily pronounced from 2000 through 2006. Large criminal sentences related dataset collection activity in Italy is severely constrained for various reasons such as difficulty of data collection at the courthouses, unavailability of data in digital format, and classification criteria used in the public archives. Thus, in general, judicial statistics suffer from lack of reliability and informativeness. The objective of this research is to analyze the text of criminal sentences in a revisable and verifiable way, so that information is extracted on the trial leading to the sentence, the socio-economic environment in which the relevant events occurred, and the differences between the various districts conducting the trials. The purpose is to elaborate a tool of automated analysis of the text of the sentences that is generalizable to other areas of jurisprudence, and, outside of jurisprudence, to other temporal and geographical contexts. The 726 criminal sentences that have been converted into text files have been pronounced at all judicial levels in the four Sicilian districts for mafia-related crimes. This research is relevant because, for the first time in Italy, we aim to empirically describe the juridical response to the phenomenon of organized crime, by using a large and extendable database of criminal sentences that can be analyzed with data mining techniques, rather than deriving general conclusions from a focused small set of sentences.

criminal sentence, data mining, natural language, (17 more...)

Fourth International AAAI Conference on Weblogs and Social Media

Country:

Europe > Italy > Sicily (0.25)
North America > United States > New York (0.04)

Industry:

Law > Criminal Law (0.66)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.55)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.52)

“How Incredibly Awesome!” — Click Here to Read More

Ahn, Hyung-il (Massachusetts Institute of Technology) | Geyer, Werner (IBM) | Dugan, Casey (IBM) | Millen, David R. (IBM)

We investigate the impact of a discussion snippet's overall sentiment on a user's willingness to read more of a discussion. Using sentiment analysis, we constructed positive, neutral, and negative discussion snippets using the discussion topic and a sample comment from discussions taking place around content on an enterprise social networking site. We computed personalized snippet recommendations for a subset of users and conducted a survey to test how these recommendations were perceived. Our experimental results show that snippets with high sentiments are better discussion "teasers."

artificial intelligence, machine learning, natural language, (20 more...)

Fourth International AAAI Conference on Weblogs and Social Media

Country: North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)

Genre:

Research Report > New Finding (0.89)
Research Report > Experimental Study (0.69)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.36)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.36)

Sandler, Ted, Blitzer, John, Talukdar, Partha P., Ungar, Lyle H.

Regularized Learning with Networks of Features

Neural Information Processing SystemsDec-31-2009

For many supervised learning problems, we possess prior knowledge about which features yield similar information about the target variable. In predicting the topic of a document, we might know that two words are synonyms, or when performing image recognition, we know which pixels are adjacent. Such synonymous or neighboring features are near-duplicates and should therefore be expected to have similar weights in a good model. Here we present a framework for regularized learning in settings where one has prior knowledge about which features are expected to have similar and dissimilar weights. This prior knowledge is encoded as a graph whose vertices represent features and whose edges represent similarities and dissimilarities between them. During learning, each feature's weight is penalized by the amount it differs from the average weight of its neighbors. For text classification, regularization using graphs of word co-occurrences outperforms manifold learning and compares favorably to other recently proposed semi-supervised learning methods. For sentiment analysis, feature graphs constructed from declarative human knowledge, as well as from auxiliary task learning, significantly improve prediction accuracy.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Industry: Education (0.55)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.67)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.67)

Pouliquen, Bruno, Steinberger, Ralf, Ignat, Camelia

Automatic annotation of multilingual text collections with a conceptual thesaurus

arXiv.org Artificial IntelligenceDec-1-2009

Automatic annotation of documents with controlled vocabulary terms (descriptors) from a conceptual thesaurus is not only useful for document indexing and retrieval. The mapping of texts onto the same thesaurus furthermore allows to establish links between similar documents. This is also a substantial requirement of the Semantic Web. This paper presents an almost language-independent system that maps documents written in different languages onto the same multilingual conceptual thesaurus, EUROVOC. Conceptual thesauri differ from Natural Language Thesauri in that they consist of relatively small controlled lists of words or phrases with a rather abstract meaning. To automatically identify which thesaurus descriptors describe the contents of a document best, we developed a statistical, associative system that is trained on texts that have previously been indexed manually. In addition to describing the large number of empirically optimised parameters of the fully functional application, we present the performance of the software according to a human evaluation by professional indexers.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

cs/0609059

Country:

North America > United States (0.68)
Europe > Romania > București - Ilfov Development Region > Municipality of Bucharest > Bucharest (0.41)

Genre: Research Report (0.64)

Industry:

Health & Medicine (0.46)
Government (0.46)

Technology:

Information Technology > Communications > Web > Semantic Web (0.61)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.47)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.46)
(3 more...)