lingpipe
Text Compression for Sentiment Analysis via Evolutionary Algorithms
Dufourq, Emmanuel, Bassett, Bruce A.
Can textual data be compressed intelligently without losing accuracy in evaluating sentiment? In this study, we propose a novel evolutionary compression algorithm, PARSEC (PARts-of-Speech for sEntiment Compression), which makes use of Parts-of-Speech tags to compress text in a way that sacrifices minimal classification accuracy when used in conjunction with sentiment analysis algorithms. An analysis of PARSEC with eight commercial and non-commercial sentiment analysis algorithms on twelve English sentiment data sets reveals that accurate compression is possible with (0%, 1.3%, 3.3%) loss in sentiment classification accuracy for (20%, 50%, 75%) data compression with PARSEC using LingPipe, the most accurate of the sentiment algorithms. Other sentiment analysis algorithms are more severely affected by compression. We conclude that significant compression of text data is possible for sentiment analysis depending on the accuracy demands of the specific application and the specific sentiment analysis algorithm used.
Natural Language Processing with Java and LingPipe Cookbook: Breck Baldwin, Krishna Dayanidhi: 9781783284672: Amazon.com: Books
LingPipe is a Natural Language Processing (NLP) library that is released under a dual commercial and an open-source AGPL license, and the basis for a NLP consulting company (Alias-I) that one of the authors (Breck Baldwin) founded. In fact, the preface of the book states that some of the recipes in this book come from Breck's private repository. This book is the first one devoted exclusively to LingPipe. While LingPipe provides comprehensive Javadocs and tutorials on its website, but it is fairly dense material (NLP is hard!) - the book is an easier, gentler way to understand it. One other reason LingPipe's API is so dense (even compared to other Java NLP libraries) is because it is written for performance, making heavy use of encapsulation to wrap common tasks and the visitor pattern to consume data in streaming mode. The book does a good job explaining the latter pattern in some depth, and deconstructing the code examples so the former becomes more obvious.
Machine Learning With Kafka Streams - DZone Big Data
The last two posts on Kafka Streams (Kafka Processor API, KStreams DSL) introduced kafka streams and described how to get started using the API. This post will demonstrate a use case that prior to the development of kafka streams, would have required using a separate cluster running another framework. We are going to take live a stream of data from twitter and perform language analysis to identify tweets in English, French and Spanish. The library we are going to do this with is LingPipe. LingPipe is tool kit for processing text using computational linguistics.