Goto

Collaborating Authors

 minute nlp


Two minutes NLP -- Quick Intro to Knowledge Base Question Answering

#artificialintelligence

Knowledge base question answering (KBQA) aims to answer a natural language question over a knowledge base (KB) as its knowledge source. A knowledge base (KB) is a structured database that contains a collection of facts in the form subject, relation, object, where each fact can have properties attached called qualifiers. For example, the sentence "Barack Obama got married to Michelle Obama on 3 October 1992 at Trinity United Church" can be represented by the tuple Barack Obama, Spouse, Michelle Obama, with the qualifiers start time 3 October 1992 and place of marriage Trinity United Church . Popular knowledge bases are DBpedia and WikiData. Early works on KBQA focused on simple question answering, where there's only a single fact involved.


Two minutes NLP -- Learn TF-IDF with easy examples

#artificialintelligence

TF-IDF (Term Frequency-Inverse Document Frequency) is a way of measuring how relevant a word is to a document in a collection of documents. TF-IDF has many uses, such as in information retrieval, text analysis, keyword extraction, and as a way of obtaining numeric features from text for machine learning algorithms. TF-IDF was first designed for document search and information retrieval, where a query is run and the system has to find the most relevant documents. Suppose the query is the text "The bug". The system would give each document a higher score proportionally to the frequencies of the query words found in the document, weighting more rare words like "bug" with respect to common words like "the".


Two minutes NLP -- The OpenAI WebGPT model that answers questions browsing the web

#artificialintelligence

The OpenAI team just presented WebGPT, a fine-tuned GPT-3 to answer long-form questions using a text-based web-browsing environment, which allows the model to search and navigate the web. The prototype copies how humans research answers to questions online: it submits search queries, follows links, and scrolls up and down web pages. It is trained to cite its sources, which makes it easier to give feedback to improve factual accuracy. WebGPT is part of the field of long-form question-answering (LFQA), in which a paragraph-length answer is generated in response to an open-ended question. LFQA systems have the potential to become one of the main ways people learn about the world, but currently lag behind human performance.


Two minutes NLP -- Doc2Vec in a nutshell

#artificialintelligence

Doc2Vec is an unsupervised algorithm that learns embeddings from variable-length pieces of texts, such as sentences, paragraphs, and documents. It's originally presented in the paper Distributed Representations of Sentences and Documents. Let's review Word2Vec first, as it provides the inspiration for the Doc2Vec algorithm. Word2Vec learns word vectors by predicting a word in a sentence using the other words in the context. In this framework, every word is mapped to a unique vector, represented by a column in a matrix W. The concatenation or sum of the vectors is then used as features for the prediction of the next word in a sentence. The word vectors are trained using stochastic gradient descent.