AITopics

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

@machinelearnbotDec-12-2016, 20:05:11 GMT

Walmart Competition: Trip Type Classification

They took the NYC Data Science Academy 12-week full-time data science bootcamp program from Sep. 23 to Dec. 18, 2015. The post was based on their fourth in-class project (due after the 8th week of the program). Walmart uses trip type classification to segment its shoppers and their store visits to better improve the shopping experience. Walmart's trip types are created from a combination of existing customer insights and purchase history data. The purpose of the Kaggle competition is to use only the purchase data provided to derive Walmart's classification labels.

department description, machine learning, natural language, (16 more...)

Industry: Retail (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Ensemble Learning (0.38)

@machinelearnbotNov-2-2016, 08:20:12 GMT

[Project] Document Classification • /r/MachineLearning

I am currently trying to work out a way to accurately classify documents into 3 different categories. The documents are rather lengthy, usually several thousands of words, unstructured and pretty much entirely full sentences. There are some keywords that increases the probability of the document belonging to one particular category, but not all of them are known. Until now I have tried to clean the documents by getting rid of punctuation, common stop words and non-alphabetical strings. Since only a small part of the text is relevant, I was planning to try a tf-idf process to identify significant words within the documents.

artificial intelligence, machinelearning, natural language, (2 more...)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

@machinelearnbotOct-26-2016, 17:21:40 GMT

Text Classification & Sentiment Analysis tutorial / blog

Natural Language Processing (NLP) is a vast area of Computer Science that is concerned with the interaction between Computers and Human Language[1]. Within NLP many tasks are – or can be reformulated as – classification tasks. In classification tasks we are trying to produce a classification function which can give the correlation between a certain'feature' and a class . This Classifier first has to be trained with a training dataset, and then it can be used to actually classify documents. Training means that we have to determine its model parameters.

machine learning, natural language, text classification, (14 more...)

Country:

North America > United States (0.14)
Europe > Netherlands > South Holland > The Hague (0.05)

Genre: Instructional Material > Course Syllabus & Notes (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.68)
Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.68)
(2 more...)

#artificialintelligenceOct-24-2016, 15:35:48 GMT

Text Analysis 101; A Basic Understanding for Business Users: Document Classification - AYLIEN

The automatic classification of documents is an example of how Machine Learning (ML) and Natural Language Processing (NLP) can be leveraged to enable machines to better understand human language. By classifying text, we are aiming to assign one or more classes or categories to a document or piece of text, making it easier to manage and sort the documents. Manually categorizing and grouping text sources can be extremely laborious and time-consuming, especially for publishers, news sites, blogs or anyone who deals with a lot of content. Broadly speaking, there are two classes of ML techniques: supervised and unsupervised. In supervised methods, a model is created based on previous observations i.e. a training set.

category, machine learning, natural language, (16 more...)

Country: North America > United States > New York (0.05)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

@machinelearnbotOct-23-2016, 15:00:15 GMT

Classifications in R: Response Modeling/Credit Scoring/Credit Rating using Machine Learning Techniques

This article was written by Ariful Mondal. Artful is a senior manager, data science and big data analytics consultant at Tata Consultancy Services. This is an attempt to showcase some worked out examples of Machine Learning (ML) use German Credit Data. Though we have selected credit scoring problem as a case study in this article, the same process will be applicable for wide range of classification or regression problems "Response modeling", "Risk Management", "Attrition/Churn management", "Cross-Sell/Up-Sell", "usage Patterns", "Net Present Value", "Life Time Value", "Predictive Maintenance and condition based monitoring", "Warranty", "Reliability", "Failure Prediction", "Image/Video Processing", "Crime", "Medical Experiments", "Hidden pattern recognition" . The basic difference of traditional modeling and machine learning is that "in traditional modeling we intend to set up a modeling framework and try to establish relationships while in machine learning we allow the model to learn from the data by understanding the hidden patterns".

data mining, natural language, text classification, (4 more...)

Industry: Banking & Finance > Credit (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.64)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.40)

Moreo Fernández, Alejandro, Esuli, Andrea, Sebastiani, Fabrizio

Lightweight Random Indexing for Polylingual Text Classification

Journal of Artificial Intelligence ResearchOct-13-2016

Multilingual Text Classification (MLTC) is a text classification task in which documents are written each in one among a set L of natural languages, and in which all documents must be classified under the same classification scheme, irrespective of language. There are two main variants of MLTC, namely Cross-Lingual Text Classification (CLTC) and Polylingual Text Classification (PLTC). In PLTC, which is the focus of this paper, we assume (differently from CLTC) that for each language in L there is a representative set of training documents; PLTC consists of improving the accuracy of each of the |L| monolingual classifiers by also leveraging the training documents written in the other (|L| − 1) languages. The obvious solution, consisting of generating a single polylingual classifier from the juxtaposed monolingual vector spaces, is usually infeasible, since the dimensionality of the resulting vector space is roughly |L| times that of a monolingual one, and is thus often unmanageable. As a response, the use of machine translation tools or multilingual dictionaries has been proposed. However, these resources are not always available, or are not always free to use. One machine-translation-free and dictionary-free method that, to the best of our knowledge, has never been applied to PLTC before, is Random Indexing (RI). We analyse RI in terms of space and time efficiency, and propose a particular configuration of it (that we dub Lightweight Random Indexing LRI). By running experiments on two well known public benchmarks, Reuters RCV1/RCV2 (a comparable corpus) and JRC-Acquis (a parallel one), we show LRI to outperform (both in terms of effectiveness and efficiency) a number of previously proposed machine-translation-free and dictionary-free PLTC methods that we use as baselines.

lightweight random indexing, proceedings, representation, (12 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.5194

AI Access Foundation

11025

Journal of Artificial Intelligence Research

Country:

Europe > Sweden > Stockholm > Stockholm (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)
Asia > Middle East > Jordan (0.04)
(18 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Supervised Learning > Representation Of Examples (0.55)

#artificialintelligenceOct-8-2016, 14:56:27 GMT

Most popular kaggle competition solutions

Large Scale Hierarchical Text Classification is a document classification challenge to classify a given Wikipedia document into one of the 325,056 categories. Wikipedia has created this very large dataset. The dataset is multi-class, multi-label and hierarchical. The numbers of categories were somewhere around 325,000 and the numbers documents size is 2,400,000. This challenge builds upon a series of successful challenges on large-scale hierarchical text classification. Demokritos will give more information on this dataset at http://lshtc.iit.demokritos.gr/

artificial intelligence, natural language, text classification, (13 more...)

Genre: Contests & Prizes (0.54)

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.78)

#artificialintelligenceJul-19-2016, 01:55:20 GMT

Text Classification in Microsoft's Azure Machine Learning Studio CrowdFlower

There are lots of great tools out there for building machine learning models and data processing pipelines. Most of these tools, like R, scikit-learn, spark.ml At CrowdFlower, we use many of these resources to varying degrees. However, we also recognize that many people will prefer to approach model building and deployment in a hands-on integrated environment supported by a graphical interface. To this end, we are pleased to showcase an end-to-end model construction process in Microsoft's Azure Machine Learning Studio.

machine learning, natural language, text classification, (13 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.51)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis (0.33)

#artificialintelligenceJul-15-2016, 15:30:47 GMT

[1607.03822v1] Feature Extraction and Automated Classification of Heartbeats by Machine Learning

Which authors of this paper are endorsers? Disable MathJax (What is MathJax?)

data mining, machine learning, natural language, (6 more...)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Data Science > Data Mining > Feature Extraction (0.63)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.63)