Goto

Collaborating Authors

word2vec


A Complete Guide on Feature Extraction Techniques - Analytics Vidhya

#artificialintelligence

This article was published as a part of the Data Science Blogathon. In Natural Language Processing, Feature Extraction is one of the most important steps to be followed for a better understanding of the context of what we are dealing with. After the initial text is cleaned, we need to transform it into its features to be used for modeling. Document data is not computable so it must be transformed into numerical data such as a vector space model. This transformation task is generally called feature extraction of document data.


MLOps: How to Operationalise E-Commerce Product Recommendation System

#artificialintelligence

One of the most common challenges in an e-commerce business to build a well-performing product recommender and categorisation model. A product recommender is used to recommend similar products to users so that total time and money spent on platform per user will be increased. There is also a need to have a model to categorise products correctly since there might be some wrongly categorised products in those platforms especially where most of content is generated by users as in case of classified websites. A product categorisation model is used to catch those products and place them back into their right categories to improve overall user experience on the platform. This article has 2 main parts.


【TensorFlow・Kerasで学ぶ】時系列データ処理入門(RNN/LSTM, Word2Vec)

#artificialintelligence

TensorFlow, KerasとPython3を使って、自然言語処理や時系列データ処理を学びましょう。日本語+動画で学べる唯一の講座(2017年8月現在)です。RNN/LSTMは、機械翻訳、自動字幕表示、株価予測などに使用されています。


Meet AI's Multitool: Vector Embeddings - Liwaiwai

#artificialintelligence

Embeddings are one of the most versatile techniques in machine learning, and a critical tool every ML engineer should have in their toolbelt. It’s a shame, then, that so few of us understand what they are and what they’re good for! The problem, perhaps, is that embeddings sound slightly abstract and esoteric: In machine learning, an embedding ...


Word2vec vs BERT

#artificialintelligence

Both word2vec and BERT are recent popular methods in NLP which are used for generating vector representation of words. Essentially replacing the use of word index dictionaries and one hot encoded vectors to represent text. Both word-index and one hot encoding methods do not capture the semantic sense of language. Also, one hot encoding becomes computationally infeasible if the size of vocabulary is LARGE. Word2vec [1] is a neural network approach to learn distributed word vectors in a way that words used in similar syntactic or semantic context, lie closer to each other in the distributed vector space.



Word2vec with PyTorch: Implementing the Original Paper

#artificialintelligence

Word Embeddings is the most fundamental concept in Deep Natural Language Processing. And word2vec is one of the earliest algorithms used to train word embeddings. In this post, I want to go deeper into the first paper on word2vec -- Efficient Estimation of Word Representations in Vector Space (2013), which as of now has 24k citations, and this number is still growing. I am attaching my Github project with word2vec training. We will go through it in this post.


NLP Article #1 : A word is worth a 1000 pictures

#artificialintelligence

Natural Language Processing is quite simply the study and use of machines to intelligently use, and create, natural language. I purposely leave out the word'understand' for now, as it is a bit of a prickly subject when using probabilistic models. But the conundrum is the following: In terms of bit-rate, verbal communication is awful. A lecturer might utter 300 words per minute, which is the paltry rate of about 70 bytes/second. But switch to a two-way conversation, and that can drop even further to 100 words/minute, or a definitely anaemic 25 bytes/second.


Text Mining: Word Vectorization Techniques

#artificialintelligence

Word Vectorization is a methodology of mapping words from a vocabulary to to vectors of real numbers. These vectors can be used in various NLP ML models for performing various tasks like Text Similarity, Topic modeling, POS detection, prediction etc. Word Vectorization is a requirement for NLP ML models. NLP algorithms are used for extracting important information from text data. The Deep learning models works on numeric data. So we need to convert the text data to numeric form.


Deep Learning Algorithms - The Complete Guide

#artificialintelligence

Deep Learning is eating the world. The hype began around 2012 when a Neural Network achieved super human performance on Image Recognition tasks and only a few people could predict what was about to happen. During the past decade, more and more algorithms are coming to life. More and more companies are starting to add them in their daily business. Here, I tried to cover all the most important Deep Learning algorithms and architectures concieved over the years for use in a variety of applications such as Computer Vision and Natural Language Processing. Some of them are used more frequently than others and each one has its own streghth and weeknesses. My main goal is to give you a general idea of the field and help you understand what algorithm should you use in each specific case. Because I know it seems chaotic for someone who wants to start from scratch.