Goto

Collaborating Authors

 Information Extraction


Free Trial Signup - Gather Twitter Data DiscoverText

#artificialintelligence

Use this information to train machine-learning classifiers to recognize relevant text and social media data. Jump into data using an interactive word CloudExplorer or build a mini topic dictionary using "defined" search.


The Great Hack: the film that goes behind the scenes of the Facebook data scandal

#artificialintelligence

Cambridge Analytica may have become the byword for a scandal, but it's not entirely clear that anyone knows exactly what that scandal is. It's more like toxic word association: "Facebook", "data", "harvested", "weaponised", "Trump" and, in this country, most controversially, "Brexit". It was a media firestorm that's yet to be extinguished, a year on from whistleblower Christopher Wylie's revelations in the Observer and the New York Times about how the company acquired the personal data of tens of millions of Facebook users in order to target them in political campaigns. This week sees the release of The Great Hack, a Netflix documentary that is the first feature-length attempt to gather all the strands of the affair into some sort of narrative โ€“ though it is one contested even by those appearing in the film. "This is not about one company," Julian Wheatland, the ex-chief operating officer of Cambridge Analytica, claims at one point. "This technology is going on unabated and will continue to go on unabated.[โ€ฆ] There was always going to be a Cambridge Analytica. It just sucks to me that it's Cambridge Analytica."


Text Analytics: the convergence of Big Data and Artificial Intelligence

#artificialintelligence

The analysis of the text content in emails, blogs, tweets, forums and other forms of textual communication constitutes what we call text analytics. Text analytics is applicable to most industries: it can help analyze millions of emails; you can analyze customers-- comments and questions in forums; you can perform sentiment analysis using text analytics by measuring positive or negative perceptions of a company, brand, or product. Text Analytics has also been called text mining, and is a subcategory of the Natural Language Processing (NLP) field, which is one of the founding branches of Artificial Intelligence, back in the 1950s, when an interest in understanding text originally developed. Currently Text Analytics is often considered as the next step in Big Data analysis. Text Analytics has a number of subdivisions: Information Extraction, Named Entity Recognition, Semantic Web annotated domain--s representation, and many more.


How Bots Can Tell When the C-Suite Is Lying

#artificialintelligence

CEOs and CFOs are decidedly more nervous when fielding questions about China during earnings calls this year. What's more, they are more likely to be deceptive with their answers. "Deception associated with questions on China has skyrocketed this quarter, up about 50% from last quarter and more than double a year ago," according to a study by text analytics provider Amenity Analytics. Amenity Analytics is one of a handful of companies that are applying natural language processing (NLP), sentiment analysis and machine learning to the financial sector, evaluating earnings calls and other public meetings to unearth information of value to an investor. It is also rare technology that offers a clear path to ROI.


Multi-modal Sentiment Analysis using Deep Canonical Correlation Analysis

arXiv.org Machine Learning

This paper learns multi-modal embeddings from text, audio, and video views/modes of data in order to improve upon down-stream sentiment classification. The experimental framework also allows investigation of the relative contributions of the individual views in the final multi-modal embedding. Individual features derived from the three views are combined into a multi-modal embedding using Deep Canonical Correlation Analysis (DCCA) in two ways i) One-Step DCCA and ii) Two-Step DCCA. This paper learns text embeddings using BERT, the current state-of-the-art in text encoders. We posit that this highly optimized algorithm dominates over the contribution of other views, though each view does contribute to the final result. Classification tasks are carried out on two benchmark datasets and on a new Debate Emotion data set, and together these demonstrate that the one-Step DCCA outperforms the current state-of-the-art in learning multi-modal embeddings.


Qwant Research @DEFT 2019: Document matching and information retrieval using clinical cases

arXiv.org Machine Learning

Task 2 is a task on semantic similarity between clinical cases and discussions. For this task, we propose an approach based on language models and evaluate the impact on the results of different preprocessings and matching techniques. For task 3, we have developed an information extraction system yielding very encouraging results accuracy-wise. We have experimented two different approaches, one based on the exclusive use of neural networks, the other based on a linguistic analysis.


Wayfair Walkout, Facebook Data Value, and More News

#artificialintelligence

Tech employees are taking a stand against migrant detention centers; a proposal asking tech companies to disclose the value of your data; and a live reading of the Mueller report. Here's the news you need to know, in two minutes or less. Want to receive this two-minute roundup as an email every weekday? This afternoon, 550 employees at the Boston-based ecommerce company Wayfair staged a walkout opposing sale of company furniture to migrant detention centers. Last week, Wayfair workers discovered an order for $200,000 worth of beds and other furniture reportedly placed by government contractor BCFS for a new detention center in Carrizo Springs, Texas.


Constructing Information-Lossless Biological Knowledge Graphs from Conditional Statements

arXiv.org Artificial Intelligence

Conditions are essential in the statements of biological literature. Without the conditions (e.g., environment, equipment) that were precisely specified, the facts (e.g., observations) in the statements may no longer be valid. One biological statement has one or multiple fact(s) and/or condition(s). Their subject and object can be either a concept or a concept's attribute. Existing information extraction methods do not consider the role of condition in the biological statement nor the role of attribute in the subject/object. In this work, we design a new tag schema and propose a deep sequence tagging framework to structure conditional statement into fact and condition tuples from biological text. Experiments demonstrate that our method yields a information-lossless structure of the literature.


Open Datasets for Machine Learning Lionbridge AI

#artificialintelligence

Datasets are an integral part of machine learning. Without high quality training datasets, machine learning algorithms would have no way of knowing how to conduct sentiment analysis, categorize products or understand foreign languages. This spreadsheet contains the ultimate list of open datasets for machine learning. Organized by industry and use case, this database contains a diverse range of 300 datasets to train machine learning models.


Event extraction based on open information extraction and ontology

arXiv.org Artificial Intelligence

The work presented in this master thesis consists of extracting a set of events from texts written in natural language. For this purpose, we have based ourselves on the basic notions of the information extraction as well as the open information extraction. First, we applied an open information extraction(OIE) system for the relationship extraction, to highlight the importance of OIEs in event extraction, and we used the ontology to the event modeling. We tested the results of our approach with test metrics. As a result, the two-level event extraction approach has shown good performance results but requires a lot of expert intervention in the construction of classifiers and this will take time. In this context we have proposed an approach that reduces the expert intervention in the relation extraction, the recognition of entities and the reasoning which are automatic and based on techniques of adaptation and correspondence. Finally, to prove the relevance of the extracted results, we conducted a set of experiments using different test metrics as well as a comparative study.