AITopics | Information Extraction

Collaborating Authors

Information Extraction

News Overviews Instructional Materials AI-Alerts Classics

Deploying nEmesis: Preventing Foodborne Illness by Data Mining Social Media

Sadilek, Adam (University of Rochester) | Kautz, Henry (University of Rochester) | DiPrete, Lauren (Southern Nevada Health District, Las Vegas, Nevada) | Labus, Brian (Southern Nevada Health District, Las Vegas, Nevada) | Portman, Eric (University of Rochester) | Teitel, Jack (University of Rochester) | Silenzio, Vincent (University of Rochester)

AAAI ConferencesFeb-10-2016

Foodborne illness afflicts 48 million people annually in the U.S.alone. Over 128,000 are hospitalized and 3,000 die from the infection.While preventable with proper food safety practices, the traditional restaurant inspection process has limited impact given the predictability and low frequency of inspections, and the dynamic nature of the kitchen environment. Despite this reality, the inspection process has remained largely unchanged for decades. We apply machine learning to Twitter data and develop a system that automatically detects venues likely to pose a public health hazard.Health professionals subsequently inspect individual flagged venues in a double blind experiment spanning the entire Las Vegas metropolitan area over three months. By contrast, previous research in this domain has been limited to indirect correlative validation using only aggregate statistics. We show that adaptive inspection process is 63% more effective at identifying problematic venues than the current state of the art. The live deployment shows that if every inspection in Las Vegas became adaptive, we can prevent over 9,000 cases of foodborne illness and 557 hospitalizations annually. Additionally,adaptive inspections result in unexpected benefits, including the identification of venues lacking permits, contagious kitchen staff,and fewer customer complaints filed with the Las Vegas health department.

machine learning, natural language, tweet, (20 more...)

AAAI Conferences

Twenty-Eighth IAAI Conference

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.66)
North America > United States > New York > Monroe County > Rochester (0.04)

Genre:

Research Report > Strength High (1.00)
Research Report > Experimental Study (1.00)

Industry:

Information Technology > Services (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)
(5 more...)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.68)

Add feedback

Distributional Correspondence Indexing for Cross-Lingual and Cross-Domain Sentiment Classification.

Moreo Fernández, Alejandro, Esuli, Andrea, Sebastiani, Fabrizio

Journal of Artificial Intelligence ResearchJan-20-2016

Domain Adaptation (DA) techniques aim at enabling machine learning methods learn effective classifiers for a "target'' domain when the only available training data belongs to a different "source'' domain. In this paper we present the Distributional Correspondence Indexing (DCI) method for domain adaptation in sentiment classification. DCI derives term representations in a vector space common to both domains where each dimension reflects its distributional correspondence to a pivot, i.e., to a highly predictive term that behaves similarly across domains. Term correspondence is quantified by means of a distributional correspondence function (DCF). We propose a number of efficient DCFs that are motivated by the distributional hypothesis, i.e., the hypothesis according to which terms with similar meaning tend to have similar distributions in text. Experiments show that DCI obtains better performance than current state-of-the-art techniques for cross-lingual and cross-domain sentiment classification. DCI also brings about a significantly reduced computational cost, and requires a smaller amount of human intervention. As a final contribution, we discuss a more challenging formulation of the domain adaptation problem, in which both the cross-domain and cross-lingual dimensions are tackled simultaneously.

adaptation, dataset, proceedings, (15 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4762

AI Access Foundation

10977

Journal of Artificial Intelligence Research

Country:

North America > United States > Nevada > Clark County > Las Vegas (0.04)
Asia > South Korea (0.04)
Asia > Singapore (0.04)
(13 more...)

Genre:

Research Report > Experimental Study (0.93)
Research Report > Promising Solution (0.66)

Industry:

Media (0.67)
Leisure & Entertainment (0.45)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

How Translation Alters Sentiment

Mohammad, Saif M., Salameh, Mohammad, Kiritchenko, Svetlana

Journal of Artificial Intelligence ResearchJan-20-2016

Sentiment analysis research has predominantly been on English texts. Thus there exist many sentiment resources for English, but less so for other languages. Approaches to improve sentiment analysis in a resource-poor focus language include: (a) translate the focus language text into a resource-rich language such as English, and apply a powerful English sentiment analysis system on the text, and (b) translate resources such as sentiment labeled corpora and sentiment lexicons from English into the focus language, and use them as additional resources in the focus-language sentiment analysis system. In this paper we systematically examine both options. We use Arabic social media posts as stand-in for the focus language text. We show that sentiment analysis of English translations of Arabic texts produces competitive results, w.r.t. Arabic sentiment analysis. We show that Arabic sentiment analysis systems benefit from the use of automatically translated English sentiment lexicons. We also conduct manual annotation studies to examine why the sentiment of a translation is different from the sentiment of the source word or text. This is especially relevant for building better automatic translation systems. In the process, we create a state-of-the-art Arabic sentiment analysis system, a new dialectal Arabic sentiment lexicon, and the first Arabic-English parallel corpus that is independently annotated for sentiment by Arabic and English speakers.

lexicon, sentiment, translation, (11 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4787

AI Access Foundation

10976

Journal of Artificial Intelligence Research

Country:

North America > United States > Oregon > Multnomah County > Portland (0.14)
North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > Canada > Alberta (0.14)
(15 more...)

Genre:

Research Report > New Finding (0.93)
Overview (0.93)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)

Add feedback

Information Extraction Under Privacy Constraints

Asoodeh, Shahab, Diaz, Mario, Alajaji, Fady, Linder, Tamás

arXiv.org Machine LearningJan-17-2016

A privacy-constrained information extraction problem is considered where for a pair of correlated discrete random variables $(X,Y)$ governed by a given joint distribution, an agent observes $Y$ and wants to convey to a potentially public user as much information about $Y$ as possible without compromising the amount of information revealed about $X$. To this end, the so-called {\em rate-privacy function} is introduced to quantify the maximal amount of information (measured in terms of mutual information) that can be extracted from $Y$ under a privacy constraint between $X$ and the extracted information, where privacy is measured using either mutual information or maximal correlation. Properties of the rate-privacy function are analyzed and information-theoretic and estimation-theoretic interpretations of it are presented for both the mutual information and maximal correlation privacy measures. It is also shown that the rate-privacy function admits a closed-form expression for a large family of joint distributions of $(X,Y)$. Finally, the rate-privacy function under the mutual information privacy measure is considered for the case where $(X,Y)$ has a joint probability density function by studying the problem where the extracted information is a uniform quantization of $Y$ corrupted by additive Gaussian noise. The asymptotic behavior of the rate-privacy function is studied as the quantization resolution grows without bound and it is observed that not all of the properties of the rate-privacy function carry over from the discrete to the continuous case.

data mining, information, natural language, (19 more...)

arXiv.org Machine Learning

1511.02381

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Data Science > Data Mining > Text Mining (0.60)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.60)

Add feedback

Introduction to the Special Issue on Cross-Language Algorithms and Applications

Costa-jussà, Marta R., Bangalore, Srinivas, Lambert, Patrik, Màrquez, Lluís, Montiel-Ponsoda, Elena

Journal of Artificial Intelligence ResearchJan-12-2016

With the increasingly global nature of our everyday interactions, the need for multilin- gual technologies to support efficient and effective information access and communication cannot be overemphasized. Computational modeling of language has been the focus of Natural Language Processing, a subdiscipline of Artificial Intelligence. One of the current challenges for this discipline is to design methodologies and algorithms that are cross- language in order to create multilingual technologies rapidly. The goal of this JAIR special issue on Cross-Language Algorithms and Applications (CLAA) is to present leading re- search in this area, with emphasis on developing unifying themes that could lead to the development of the science of multi- and cross-lingualism. In this introduction, we provide the reader with the motivation for this special issue and summarize the contributions of the papers that have been included. The selected papers cover a broad range of cross-lingual technologies including machine translation, domain and language adaptation for sentiment analysis, cross-language lexical resources, dependency parsing, information retrieval and knowledge representation. We anticipate that this special issue will serve as an invaluable resource for researchers interested in topics of cross-lingual natural language processing.

machine translation, proceedings, translation, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.5022

AI Access Foundation

10973

Journal of Artificial Intelligence Research

Country:

Asia > India > Karnataka > Bengaluru (0.05)
Europe > Czechia > Prague (0.04)
Europe > Sweden > Uppsala County > Uppsala (0.04)
(18 more...)

Genre:

Overview (0.87)
Collection > Journal > Special Issue (0.77)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(2 more...)

Add feedback

Machine Learning Sentiment Prediction based on Hybrid Document Representation

Stalidis, Panagiotis, Giatsoglou, Maria, Diamantaras, Konstantinos, Sarigiannidis, George, Chatzisavvas, Konstantinos Ch.

arXiv.org Machine LearningNov-29-2015

Automated sentiment analysis and opinion mining is a complex process concerning the extraction of useful subjective information from text. The explosion of user generated content on the Web, especially the fact that millions of users, on a daily basis, express their opinions on products and services to blogs, wikis, social networks, message boards, etc., render the reliable, automated export of sentiments and opinions from unstructured text crucial for several commercial applications. In this paper, we present a novel hybrid vectorization approach for textual resources that combines a weighted variant of the popular Word2Vec representation (based on Term Frequency-Inverse Document Frequency) representation and with a Bag- of-Words representation and a vector of lexicon-based sentiment values. The proposed text representation approach is assessed through the application of several machine learning classification algorithms on a dataset that is used extensively in literature for sentiment detection. The classification accuracy derived through the proposed hybrid vectorization approach is higher than when its individual components are used for text represenation, and comparable with state-of-the-art sentiment detection methodologies.

artificial intelligence, machine learning, natural language, (21 more...)

arXiv.org Machine Learning

1511.09107

Country:

North America (0.28)
Europe > Greece (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.54)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (1.00)
(5 more...)

Add feedback

Information retrieval in folktales using natural language processing

Groza, Adrian, Corde, Lidia

arXiv.org Artificial IntelligenceNov-10-2015

Recognising literary characters in various narrative texts is challenging both from the literary and technical perspective. From the literary viewpoint, the meaning of the term "character" leaves space to various interpretations. From the technical perspective, literary texts contain a lot of data about emotions, social life or inner life of the characters, while they are very thin on technical, straightforward messages. To infer the character type from literary texts might pose problems even to the human readers [4]. Interactions between literary characters contain rich social networks.

artificial intelligence, information retrieval, natural language, (18 more...)

arXiv.org Artificial Intelligence

1511.03012

Country: Europe > Romania (0.15)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Ontologies (0.52)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.51)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.47)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (0.46)

Add feedback

Domain Scoping for Subject Matter Experts

Khabiri, Elham (IBM) | Riemer, Matthew (IBM) | III, Fenno F. Heath (IBM) | Hull, Richard (IBM)

AAAI ConferencesNov-1-2015

Exploring web and in particular social media data is an essential task to many of the subject matter experts in order to discover content around their subject of interest. It is important to provide them with a tool to define their scope of vocabulary, i.e what to search for, and suggest them commonly used terms besides the serendipitous terms allowing them to define their scope of explorations. This paper presents methods on constructing ``domain models" which are families of keywords and extractors to enable focus on social media documents relevant to a project using multiple channels of information extraction.

data mining, machine learning, natural language, (18 more...)

AAAI Conferences

2015 AAAI Fall Symposium Series

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(5 more...)

Industry:

Education (1.00)
Information Technology (0.69)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.47)
(3 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)

Add feedback

Opinion mining from twitter data using evolutionary multinomial mixture models

Hasnat, Md. Abul, Velcin, Julien, Bonnevay, Stéphane, Jacques, Julien

arXiv.org Machine LearningSep-24-2015

Image of an entity can be defined as a structured and dynamic representation which can be extracted from the opinions of a group of users or population. Automatic extraction of such an image has certain importance in political science and sociology related studies, e.g., when an extended inquiry from large-scale data is required. We study the images of two politically significant entities of France. These images are constructed by analyzing the opinions collected from a well known social media called Twitter. Our goal is to build a system which can be used to automatically extract the image of entities over time. In this paper, we propose a novel evolutionary clustering method based on the parametric link among Multinomial mixture models. First we propose the formulation of a generalized model that establishes parametric links among the Multinomial distributions. Afterward, we follow a model-based clustering approach to explore different parametric sub-models and select the best model. For the experiments, first we use synthetic temporal data. Next, we apply the method to analyze the annotated social media data. Results show that the proposed method is better than the state-of-the-art based on the common evaluation metrics. Additionally, our method can provide interpretation about the temporal evolution of the clusters.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Machine Learning

1509.07344

Country: Europe > France (0.66)

Genre: Research Report > New Finding (0.66)

Industry:

Government (0.46)
Information Technology > Services (0.41)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (1.00)

Add feedback

Twitter Sentiment Analysis: Lexicon Method, Machine Learning Method and Their Combination

Kolchyna, Olga, Souza, Tharsis T. P., Treleaven, Philip, Aste, Tomaso

arXiv.org Machine LearningSep-18-2015

This paper covers the two approaches for sentiment analysis: i) lexicon based method; ii) machine learning method. We describe several techniques to implement these approaches and discuss how they can be adopted for sentiment classification of Twitter messages. We present a comparative study of different lexicon combinations and show that enhancing sentiment lexicons with emoticons, abbreviations and social-media slang expressions increases the accuracy of lexicon-based classification for Twitter. We discuss the importance of feature generation and feature selection processes for machine learning sentiment classification. To quantify the performance of the main sentiment analysis methods over Twitter we run these algorithms on a benchmark Twitter dataset from the SemEval-2013 competition, task 2-B. The results show that machine learning method based on SVM and Naive Bayes classifiers outperforms the lexicon method. We present a new ensemble method that uses a lexicon based sentiment score as input feature for the machine learning approach. The combined method proved to produce more precise classifications. We also show that employing a cost-sensitive classifier for highly unbalanced datasets yields an improvement of sentiment classification performance up to 7%.

artificial intelligence, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

1507.00955

Country:

Europe (1.00)
Asia (0.92)
North America > United States > Massachusetts > Middlesex County (0.28)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Services (1.00)
Banking & Finance (1.00)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (1.00)
(3 more...)

Add feedback