Information Extraction
autoNLP: NLP Feature Recommendations for Text Analytics Applications
While designing machine learning based text analytics applications, often, NLP data scientists manually determine which NLP features to use based upon their knowledge and experience with related problems. This results in increased efforts during feature engineering process and renders automated reuse of features across semantically related applications inherently difficult. In this paper, we argue for standardization in feature specification by outlining structure of a language for specifying NLP features and present an approach for their reuse across applications to increase likelihood of identifying optimal features.
CERN physics lab drops Facebook over data concerns
GENEVA – Europe's physics lab CERN on Wednesday said it had stopped using a Facebook team-chat application because of concerns about handing over data to the U.S. tech giant. CERN said it wound up its Facebook Workplace account on Jan. 31 after the U.S. firm gave it the choice of either paying to use the service or sharing data. "Losing control of our data was unacceptable," CERN said in a blog on Jan. 28, confirmed to AFP by spokeswoman Anais Rassat on Wednesday. CERN said it started using Workplace when it was offered the service for free in 2016. It said some 1,000 members of the CERN community had created accounts and there were around 150 active users each week.
Python NLP Tutorial: Information Extraction and Knowledge Graphs
In a previous article, we discussed about Natural Language Processing and various tools that we have to quickly get our hands dirty in this field. This post will be about trying spaCy, one of the most wonderful tools that we have for NLP tasks in Python. Today's objective is to get us acquainted with spaCy and NLP. We will write some code to build a small knowledge graph that will contain structured information extracted from unstructured text. The entire code for the project can be found at the end of this article.
Search technologies drive text analytics : Solr vs. Elasticsearch
With enterprises that produce large quantities of data there is a growing need for better enterprise search solutions. With the availability of Lucene, Solr and Elasticsearch over the last 10 years, dealing with the challenges of finding content these solutions help in more ways than you realize. Whether your company needs a solution for sentiment analysis, text analytics or advanced faceted search technologies, Solr and Elasticsearch provide a great solution to meet multiple requirements. Enterprise Content Understanding how important text mining/analytics and search technologies are for current enterprise-level businesses, you only need to look at the volume of data that is created across the multitude of various content creation platforms. Most businesses employ many different internal and external software solutions for everything from accounting to social media marketing and industry specific examples such as autocad for digital drawings and engineering.
Combating the coronavirus with Twitter, data mining, and machine learning
The coronavirus illness (nCoV) is now an international public health emergency, bigger than the SARS outbreak of 2003. Unlike SARS, this time around scientists have better genome sequencing, machine learning, and predictive analysis tools to understand and monitor the outbreak. During the SARS outbreak, it took five months for scientists to sequence the virus's genome. However, the first 2019-nCoV case was reported in December, and scientists had the genome sequenced by January 10, only a month later. Researchers have been using mapping tools to track the spread of disease for several years.
Adversarial Training for Aspect-Based Sentiment Analysis with BERT
Karimi, Akbar, Rossi, Leonardo, Prati, Andrea, Full, Katharina
Aspect-Based Sentiment Analysis (ABSA) deals with the extraction of sentiments and their targets. Collecting labeled data for this task in order to help neural networks generalize better can be laborious and time-consuming. As an alternative, similar data to the real-world examples can be produced artificially through an adversarial process which is carried out in the embedding space. Although these examples are not real sentences, they have been shown to act as a regularization method which can make neural networks more robust. In this work, we apply adversarial training, which was put forward by Goodfellow et al. (2014), to the post-trained BERT (BERT-PT) language model proposed by Xu et al. (2019) on the two major tasks of Aspect Extraction and Aspect Sentiment Classification in sentiment analysis. After improving the results of post-trained BERT by an ablation study, we propose a novel architecture called BERT Adversarial Training (BAT) to utilize adversarial training in ABSA. The proposed model outperforms post-trained BERT in both tasks. To the best of our knowledge, this is the first study on the application of adversarial training in ABSA.
Twitter data could have been a source of Kremlin intelligence during the 2014 Ukraine conflict
Kremlin analysts could have used Twitter as a source of military intelligence to inform their actions in the 2014 Russia–Ukraine conflict, a study has found. University of California experts showed that location-tagged tweets by Ukraine residents could have been used to map out sentiments towards Russia in real-time. The map they made of pro-Kremlin regions turned out to bear a striking resemblance to the actual areas to which Russia dispatched its special forces. Specifically, this included Crimea and regions in the far east of Ukraine -- where the incoming forces would have been most likely to be seen as liberators. In contrast, the data could also reveal those areas where dispatching forces would have lead to greater resistance and corresponding casualties and costs.
How AI Can Boost Your Social Media Marketing
Social media marketing is an inevitable marketing strategy of every business, here is the analysis of how ai helps to boost social media marketing campaign. Growth of data science and implementation of artificial intelligence in marketing campaigns are helping social media marketers to save lots of time and effort. Social media marketing is an unavoidable part of every business marketing strategy because this age has become the social media era. According to Social media today, Roughly 45% of the world's population uses social media and they spend an average of 2 hours and 23 minutes per day on social media. Social media is not only used to promote your products/services or sharing news but it also an effective and powerful tool for reaching new clients, increased brand awareness and customer support.
Unsupervised Sentiment Analysis for Code-mixed Data
Yadav, Siddharth, Chakraborty, Tanmoy
Code-mixing is the practice of alternating between two or more languages. Mostly observed in multilingual societies, its occurrence is increasing and therefore its importance. A major part of sentiment analysis research has been monolingual, and most of them perform poorly on code-mixed text. In this work, we introduce methods that use different kinds of multilingual and cross-lingual embeddings to efficiently transfer knowledge from monolingual text to code-mixed text for sentiment analysis of code-mixed text. Our methods can handle code-mixed text through a zero-shot learning. Our methods beat state-of-the-art on English-Spanish code-mixed sentiment analysis by absolute 3\% F1-score. We are able to achieve 0.58 F1-score (without parallel corpus) and 0.62 F1-score (with parallel corpus) on the same benchmark in a zero-shot way as compared to 0.68 F1-score in supervised settings. Our code is publicly available.
AutoMATES: Automated Model Assembly from Text, Equations, and Software
Pyarelal, Adarsh, Valenzuela-Escarcega, Marco A., Sharp, Rebecca, Hein, Paul D., Stephens, Jon, Bhandari, Pratik, Lim, HeuiChan, Debray, Saumya, Morrison, Clayton T.
There exist today state-of-the-art computational models that can provide highly accurate predictions about complex phenomena such as crop growth and weather patterns. However, certain phenomena, such as food insecurity, involve a host of factors that cannot be modeled by any single one of these models, but which instead require the integration of multiple models. To truly integrate these computational models, it is necessary to'lift' them to a common representation that is (i) agnostic to the software implementation, (ii) semantically rich enough to represent the implicit domain knowledge in the models, and (iii) connected to the domain literature. The AutoMATES project aims to build technology to construct and curate semantically-rich representations of scientific models by integrating three different sources of information: - natural language descriptions of models in publications and other technical documentation, - the equations contained in these documents, and - the software the implements these models. An example of a model being represented in these three forms (text, equations, and software) is shown in Figure 1. This model is a differential equation describing the biophysical variable, leaf area index (LAI). The network on the right half of the figure is an aspirational representation of the model as a Bayesian network. Although this example is handcrafted, our end goal is to be able to automatically assemble models with this level of semantic richness.