Goto

Collaborating Authors

 Information Extraction


Open Vocabulary Electroencephalography-To-Text Decoding and Zero-shot Sentiment Classification

arXiv.org Artificial Intelligence

State-of-the-art brain-to-text systems have achieved great success in decoding language directly from brain signals using neural networks. However, current approaches are limited to small closed vocabularies which are far from enough for natural communication. In addition, most of the high-performing approaches require data from invasive devices (e.g., ECoG). In this paper, we extend the problem to open vocabulary Electroencephalography(EEG)-To-Text Sequence-To-Sequence decoding and zero-shot sentence sentiment classification on natural reading tasks. We hypothesis that the human brain functions as a special text encoder and propose a novel framework leveraging pre-trained language models (e.g., BART). Our model achieves a 40.1% BLEU-1 score on EEG-To-Text decoding and a 55.6% F1 score on zero-shot EEG-based ternary sentiment classification, which significantly outperforms supervised baselines. Furthermore, we show that our proposed model can handle data from various subjects and sources, showing great potential for a high-performance open vocabulary brain-to-text system once sufficient data is available


STEM: Unsupervised STructural EMbedding for Stance Detection

arXiv.org Artificial Intelligence

Stance detection is an important task, supporting many downstream tasks such as discourse parsing and modeling the propagation of fake news, rumors, and science denial. In this paper, we propose a novel framework for stance detection. Our framework is unsupervised and domain-independent. Given a claim and a multi-participant discussion - we construct the interaction network from which we derive topological embedding for each speaker. These speaker embedding enjoy the following property: speakers with the same stance tend to be represented by similar vectors, while antipodal vectors represent speakers with opposing stances. These embedding are then used to divide the speakers into stance-partitions. We evaluate our method on three different datasets from different platforms. Our method outperforms or is comparable with supervised models while providing confidence levels for its output. Furthermore, we demonstrate how the structural embedding relate to the valence expressed by the speakers. Finally, we discuss some limitations inherent to the framework.


Continual Learning with Knowledge Transfer for Sentiment Classification

arXiv.org Artificial Intelligence

This paper studies continual learning (CL) for sentiment classification (SC). In this setting, the CL system learns a sequence of SC tasks incrementally in a neural network, where each task builds a classifier to classify the sentiment of reviews of a particular product category or domain. Two natural questions are: Can the system transfer the knowledge learned in the past from the previous tasks to the new task to help it learn a better model for the new task? And, can old models for previous tasks be improved in the process as well? This paper proposes a novel technique called KAN to achieve these objectives. KAN can markedly improve the SC accuracy of both the new task and the old tasks via forward and backward knowledge transfer. The effectiveness of KAN is demonstrated through extensive experiments.


Jose Almeida on LinkedIn: Data Governance Postmortem

#artificialintelligence

How important is to handle data as any other business asset? Data governance includes the people, processes, and technology used to manage the data asset, aligned with a clear data strategy that is focused on establishing the conditions for data to support business. Becoming data-driven begins with establishing a strong data foundation, that will increase the quality and efficiency of corporate decision processes, positively affecting business operations, strategy, and performance. Bottom line, business success depends on the execution and implementation of those decisions, and they are only as good as the data that supports them. The true measure of success is the quality of the organization's decision processes; the organizations best able to make the best insight-driven decisions faster will gain the competitive edge.


GenIE: Generative Information Extraction

arXiv.org Machine Learning

Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema. Most existing works are pipelines prone to error accumulation, and all approaches are only applicable to unrealistically small numbers of entities and relations. We introduce GenIE (generative information extraction), the first end-to-end autoregressive formulation of closed information extraction. GenIE naturally exploits the language knowledge from the pre-trained transformer by autoregressively generating relations and entities in textual form. Thanks to a new bi-level constrained generation strategy, only triplets consistent with the predefined knowledge base schema are produced. Our experiments show that GenIE is state-of-the-art on closed information extraction, generalizes from fewer training data points than baselines, and scales to a previously unmanageable number of entities and relations. With this work, closed information extraction becomes practical in realistic scenarios, providing new opportunities for downstream tasks. Finally, this work paves the way towards a unified end-to-end approach to the core tasks of information extraction. Code and models available at https://github.com/epfl-dlab/GenIE.


Building on Huang et al. GlossBERT for Word Sense Disambiguation

arXiv.org Artificial Intelligence

We propose to take on the problem ofWord Sense Disambiguation (WSD). In language, words of the same form can take different meanings depending on context. While humans easily infer the meaning or gloss of such words by their context, machines stumble on this task.As such, we intend to replicated and expand upon the results of Huang et al.GlossBERT, a model which they design to disambiguate these words (Huang et al.,2019). Specifically, we propose the following augmentations: data-set tweaking(alpha hyper-parameter), ensemble methods, and replacement of BERT with BART andALBERT. The following GitHub repository contains all code used in this report, which extends on the code made available by Huang et al.


Rome's Libraries Readers' Comments Analysis with Deep Learning

#artificialintelligence

This posts describes, along with Python code, an analysis of the readers' comments open dataset from Rome's libraries made publicly available by "Istituzione Biblioteche di Roma"ยน. The analysis leverages topic modeling techniques to find recurring topics among readers' comments, and thus determine, by inference, the themes of the borrowed books and the interests of the readers. Moreover, sentiment analysis is performed to determine whether customers comments are positive or negative. Finally, readers data (age and occupation) are used to achieve customers segmentation via clustering techniques. This provides insights on the topics of borrowed books, the readers sentiment and different readers clusters.


Jose Almeida on LinkedIn: Data-driven business means business-driven data

#artificialintelligence

There's greater accountability that is expected of data controllers or data processors, and this heralds the arrival of a compliance burden on entities. Join us to discuss: ยท The case for regulation of personal data usage ยท Expectations of the regulator ยท How do you comply with the regulations? Come, let's talk about #privacy. Pathways International and Jose Almeida will host a webinar on personal data protection, titled: Data Protection Act - Roadmap to Compliance.


Sentiment Analysis

#artificialintelligence

Sentiment analysis is a methodology for analysing text data and classifying the sentiment contained within it. It is a useful technique for every customer facing industry (retail, finance, telco, utilities, etc) which needs to understand how consumers are thinking about them and their products, features and services. Sentiment analysis is a key feature in understanding and predicting churn, developing more accurate customer segmentations and creating recommender systems which have a good take-up of product and service offerings. Today, organisations have access to vast amounts of digital data from multiple platforms, including social media, review platforms, chatbots and influencer marketing campaigns, as well as internal CRM and Enterprise Marketing Systems. This heterogeneous data environment means that multiple types of sentiment model may be needed to truly understand customers, with different models used for understanding emotions, opinions, future intent or what aspects of a product or service are liked or disliked.


Nate Silver savages media study claiming harsher treatment of Biden compared to Trump: 'Complete crap'

FOX News

In media news today, CNN and Chris Cuomo issue scathing statements against each other, the former anchor announces he's leaving his SiriusXM radio show, and a New York Times op-ed gets mocked for fearing free library is contributing to gentrification. Pollster Nate Silver on Monday savaged the analytics behind a recent Washington Post column claiming President Biden was being treated just as badly, or worse, by the media than former President Trump. In the piece published last week, liberal columnist Dana Milbank complained about Biden's media coverage being overly tough and implored journalists to do "soul-searching" and "think about what it is we're delivering to people." In a series of tweets, Silver argued the piece's "sentiment analysis" measuring the positivity and negativity of particular articles written about Trump and Biden was "complete crap," and gave examples to show how the data could be skewed more positively or negatively than it should have been. "To this good thread explaining why the'sentiment analysis' cited in the [Dana Milbank] WaPo article this weekend is complete crap--the analysis was used to make the claim that the press is just negative toward Biden as Trump--I'll also add a couple of comments based on their data," Silver wrote.