This study's goal is to create a model of sentiment analysis on a 2000 rows IMDB movie comments and 3200 Twitter data by using machine learning and vector space techniques; positive or negative preliminary information about the text is to provide. In the study, a vector space was created in the KNIME Analytics platform, and a classification study was performed on this vector space by Decision Trees, Na\"ive Bayes and Support Vector Machines classification algorithms. The conclusions obtained were compared in terms of each algorithms. The classification results for IMDB movie comments are obtained as 94,00%, 73,20%, and 85,50% by Decision Tree, Naive Bayes and SVM algorithms. The classification results for Twitter data set are presented as 82,76%, 75,44% and 72,50% by Decision Tree, Naive Bayes SVM algorithms as well. It is seen that the best classification results presented in both data sets are which calculated by SVM algorithm.
Most of the time, the raw data that we need for our data science project is not organized in a neat, well-structured, and insightful table. Rather, this is sometimes stored as text in a scanned document. Words in the document must then be extracted one by one to form a text formatted data cell. This is the task performed by Optical Character Recognition (OCR). As you read the words of this article, be it text or number, your eyes are able to process them by recognizing light and dark patterns that make up characters (e.g., letters, number, punctuation marks, etc.).
Many businesses are currently expanding their adoption of data science techniques to include machine learning. Marketing analytics is one of them. Anything can be reduced to numbers, including customer behavior and color perception, and therefore anything can be analyzed, modeled, and predicted. Marketing analytics already involves a wide range of data collection and transformation techniques. Social media and web driven marketing have given a big push in the digitalization of the space; counting the number of visits, the number of likes, the minutes of viewing, the number of returning customers, and so on is common practice.
Recently, deep learning has become very popular in the field of data science or, more specifically, in the field of artificial intelligence (AI). Deep learning covers a subset of machine learning algorithms, mostly stemming from neural networks. On the subject of neural networks and their training algorithms, much and more has already been written. Briefly, a neural network is an architecture of interconnected artificial neurons, each neuron performing a basic computation via its activation function. An architecture of interconnected neurons can thus implement a more complex transformation on the input data.
Topic modeling algorithms traditionally model topics as list of weighted terms. These topic models can be used effectively to classify texts or to support text mining tasks such as text summarization or fact extraction. The general procedure relies on statistical analysis of term frequencies. The focus of this work is on the implementation of the knowledge-based topic modelling services in a KNIME workflow. A brief description and evaluation of the DBPedia-based enrichment approach and the comparative evaluation of enriched topic models will be outlined based on our previous work. DBpedia-Spotlight is used to identify entities in the input text and information from DBpedia is used to extend these entities. We provide a workflow developed in KNIME implementing this approach and perform a result comparison of topic modeling supported by knowledge base information to traditional LDA. This topic modeling approach allows semantic interpretation both by algorithms and by humans.