AITopics | bag-of-word

Collaborating Authors

bag-of-word

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

95b431e51fc53692913da5263c214162-Supplemental.pdf

Neural Information Processing SystemsFeb-10-2026, 02:18:52 GMT

Hence, GPN brings a small computational overhead for uncertainty estimation during inference but is significantly faster than ensemble or dropout approaches.

artificial intelligence, machine learning, planning & scheduling, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)

Add feedback

A Proofs Lemma 1

Neural Information Processing SystemsAug-16-2025, 04:21:16 GMT

We observed that the latent representations correlate with the class assignment. Further, GPN is capable to separate nodes with perturbed features in the latent space.

artificial intelligence, machine learning, planning & scheduling, (13 more...)

Neural Information Processing Systems

Country: North America > United States > California > Santa Clara County > Palo Alto (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.67)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.46)

Add feedback

LyCon: Lyrics Reconstruction from the Bag-of-Words Using Large Language Models

Kim, Haven, Choi, Kahyun

arXiv.org Artificial IntelligenceAug-26-2024

This paper addresses the unique challenge of conducting research in lyric studies, where direct use of lyrics is often restricted due to copyright concerns. Unlike typical data, internet-sourced lyrics are frequently protected under copyright law, necessitating alternative approaches. Our study introduces a novel method for generating copyright-free lyrics from publicly available Bag-of-Words (BoW) datasets, which contain the vocabulary of lyrics but not the lyrics themselves. Utilizing metadata associated with BoW datasets and large language models, we successfully reconstructed lyrics. We have compiled and made available a dataset of reconstructed lyrics, LyCon, aligned with metadata from renowned sources including the Million Song Dataset, Deezer Mood Detection Dataset, and AllMusic Genre Dataset, available for public access. We believe that the integration of metadata such as mood annotations or genres enables a variety of academic experiments on lyrics, such as conditional lyric generation.

bag-of-word, dataset, lyric, (13 more...)

arXiv.org Artificial Intelligence

2408.1475

Country:

North America > United States > Illinois (0.05)
North America > United States > California > San Diego County > San Diego (0.05)
Asia > India (0.05)

Genre: Research Report > Promising Solution (0.34)

Industry: Law > Intellectual Property & Technology Law (0.35)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.90)

Add feedback

KUCST at CheckThat 2023: How good can we be with a generic model?

Agirrezabal, Manex

arXiv.org Artificial IntelligenceJun-15-2023

In this paper we present our method for tasks 2 and 3A at the CheckThat2023 shared task. We make use of a generic approach that has been used to tackle a diverse set of tasks, inspired by authorship attribution and profiling. We train a number of Machine Learning models and our results show that Gradient Boosting performs the best for both tasks. Based on the official ranking provided by the shared task organizers, our model shows an average performance compared to other teams.

machine learning, natural language, news article, (17 more...)

arXiv.org Artificial Intelligence

2306.09108

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Greece > Central Macedonia > Thessaloniki (0.05)
Europe > Denmark > Capital Region > Copenhagen (0.05)
(4 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Chapter 4 - The Effects of Feature Scaling: From Bags-of-Words to TF-Idf

#artificialintelligenceMar-14-2022, 14:44:01 GMT

Sets "is" to 0 since it occurred in all documents (in this case sentences).

bag-of-word, chapter 4, feature scaling, (1 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.40)

Add feedback

Bag-of-Words(BOW)

#artificialintelligenceJan-27-2022, 07:52:49 GMT

Originally published on Towards AI the World's Leading AI and Technology News and Media Company. If you are building an AI-related product or service, we invite you to consider becoming an AI sponsor. At Towards AI, we help scale AI and technology startups. Let us help you unleash your technology to the masses. In the previous blog, we have extensively discussed the need to convert text to vector to perform machine learning algorithms, so that meaningful insights can be drawn from the text data.

bag-of-word, unique word, vector, (12 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.50)

Add feedback

Two minutes NLP -- Doc2Vec in a nutshell

#artificialintelligenceDec-13-2021, 08:52:06 GMT

Doc2Vec is an unsupervised algorithm that learns embeddings from variable-length pieces of texts, such as sentences, paragraphs, and documents. It's originally presented in the paper Distributed Representations of Sentences and Documents. Let's review Word2Vec first, as it provides the inspiration for the Doc2Vec algorithm. Word2Vec learns word vectors by predicting a word in a sentence using the other words in the context. In this framework, every word is mapped to a unique vector, represented by a column in a matrix W. The concatenation or sum of the vectors is then used as features for the prediction of the next word in a sentence. The word vectors are trained using stochastic gradient descent.

doc2vec, vector, word vector, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.67)

Add feedback

Text Classification with NLP: Tf-Idf vs Word2Vec vs BERT

#artificialintelligenceJul-25-2020, 00:11:06 GMT

In this article, using NLP and Python, I will explain 3 different strategies for text multiclass classification: the old-fashioned Bag-of-Words (with Tf-Idf), the famous Word Embedding (with Word2Vec), and the cutting edge Language models (with BERT). NLP (Natural Language Processing) is the field of artificial intelligence that studies the interactions between computers and human languages, in particular how to program computers to process and analyze large amounts of natural language data. NLP is often applied for classifying text data. Text classification is the problem of assigning categories to text data according to its content. There are different techniques to extract information from raw text data and use it to train a classification model.

machine learning, natural language, tf-idf vs word2vec vs bert, (7 more...)

#artificialintelligence

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)
Information Technology > Artificial Intelligence > Natural Language > Text Classification (0.64)

Add feedback

Understanding TF-IDF in NLP.

#artificialintelligenceJul-12-2020, 17:36:47 GMT

TF-IDF, short for Term Frequency–Inverse Document Frequency, is a numerical statistic that is intended to reflect how important a word is to a document, in a collection or Corpus(Paragraph).It is often used as a Weighing Factor in searches of information retrieval, Text Mining, and User Modelling. The TF-IDF value increases proportionally to the number of times a word appears in the document and is offset by the number of documents in the corpus that contain the word, which helps to adjust for the fact that some words appear more frequently in general. TF-IDF is much more preferred than Bag-Of-Words, in which every word, is represented as 1 or 0, every time it gets appeared in each Sentence, while, in TF-IDF, gives weightage to each Word separately, which in turn defines the importance of each word than others. Let's Consider these Three sentences: Let's assume a word "Good", in sentence 1, as we know, TF(t) (Number of times term t appears in a document) / (Total number of terms in the document). So, Number of times the word "Good" appears in Sentence 1 is, 1 Time, and the Total number of times the word "Good", appears in all three Sentences is 3 times, so the TF(Term Frequency) value of word "Good" is, TF("Good") 1/3 0.333.

frequency, information retrieval, natural language, (10 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.36)

Add feedback

Hacking Scikit-Learn's Vectorizers – Towards Data Science

#artificialintelligenceApr-4-2018, 22:44:29 GMT

Natural Language Processing is a fascinating field. Since all predictors are extracted from the text, data cleaning, preprocessing and feature engineering have an even more significant impact on the model's performance. Having worked for a few months on a machine learning project of my own involving NLP, I've learned one thing or two about Scikit-Learn's vectorizers that I would like to share. Hopefully, by the end of this post, you will have some new ideas to use on your next project. As you know machines, as advanced as they may be, are not capable of understanding words and sentences in the same manner as humans do.

analyzer, scikit-learn, vectorizer, (12 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback