AITopics | vectorizer

Collaborating Authors

vectorizer

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

bf64451da212313c5ef1a00f49232c47-Paper-Conference.pdf

Neural Information Processing SystemsFeb-16-2026, 21:42:49 GMT

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada (0.04)
Europe > Spain (0.04)
(3 more...)

Industry: Information Technology > Security & Privacy (0.48)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
(3 more...)

Add feedback

RETVec: Resilient and Efficient Text Vectorizer

Neural Information Processing SystemsOct-9-2025, 06:22:51 GMT

This paper describes RETV ec, an efficient, resilient, and multilingual text vec-torizer designed for neural-based text processing. RETV ec combines a novel character encoding with an optional small embedding model to embed words into a 256-dimensional vector space. The RETV ec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks. In this paper, we evaluate and compare RETV ec to state-of-the-art vectorizers and word embeddings on popular model architectures and datasets. These comparisons demonstrate that RETV ec leads to competitive, multilingual models that are significantly more resilient to typos and adversarial text attacks.

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

North America > United States (0.04)
North America > Canada (0.04)
Europe > Spain (0.04)
(3 more...)

Industry:

Information Technology > Security & Privacy (0.68)
Government > Military (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Bootstrapping Task Spaces for Self-Improvement

Jiang, Minqi, Lupu, Andrei, Bachrach, Yoram

arXiv.org Artificial IntelligenceSep-10-2025

Progress in many task domains emerges from repeated revisions to previous solution attempts. Training agents that can reliably self-improve over such sequences at inference-time is a natural target for reinforcement learning (RL), yet the naive approach assumes a fixed maximum iteration depth, which can be both costly and arbitrary. We present Exploratory Iteration (ExIt), a family of autocurriculum RL methods that directly exploits the recurrent structure of self-improvement tasks to train LLMs to perform multi-step self-improvement at inference-time while only training on the most informative single-step iterations. ExIt grows a task space by selectively sampling the most informative intermediate, partial histories encountered during an episode for continued iteration, treating these starting points as new self-iteration task instances to train a self-improvement policy. ExIt can further pair with explicit exploration mechanisms to sustain greater task diversity. Across several domains, encompassing competition math, multi-turn tool-use, and machine learning engineering, we demonstrate that ExIt strategies, starting from either a single or many task instances, can produce policies exhibiting strong inference-time self-improvement on held-out task instances, and the ability to iterate towards higher performance over a step budget extending beyond the average iteration depth encountered during training.

arxiv preprint arxiv, large language model, machine learning, (20 more...)

arXiv.org Artificial Intelligence

2509.04575

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Sentiment Polarity Analysis of Bangla Food Reviews Using Machine and Deep Learning Algorithms

Amin, Al, Sarkar, Anik, Islam, Md Mahamodul, Miazee, Asif Ahammad, Islam, Md Robiul, Hoque, Md Mahmudul

arXiv.org Artificial IntelligenceMay-3-2024

The Internet has become an essential tool for people in the modern world. Humans, like all living organisms, have essential requirements for survival. These include access to atmospheric oxygen, potable water, protective shelter, and sustenance. The constant flux of the world is making our existence less complicated. A significant portion of the population utilizes online food ordering services to have meals delivered to their residences. Although there are numerous methods for ordering food, customers sometimes experience disappointment with the food they receive. Our endeavor was to establish a model that could determine if food is of good or poor quality. We compiled an extensive dataset of over 1484 online reviews from prominent food ordering platforms, including Food Panda and HungryNaki. Leveraging the collected data, a rigorous assessment of various deep learning and machine learning techniques was performed to determine the most accurate approach for predicting food quality. Out of all the algorithms evaluated, logistic regression emerged as the most accurate, achieving an impressive 90.91% accuracy. The review offers valuable insights that will guide the user in deciding whether or not to order the food.

accuracy, dataset, international conference, (15 more...)

arXiv.org Artificial Intelligence

2405.06667

Country:

Europe > Norway > Eastern Norway > Oslo (0.04)
North America > United States > Iowa (0.04)
Asia > Indonesia > Sulawesi > North Sulawesi > Manado (0.04)
(2 more...)

Genre: Research Report > New Finding (0.68)

Industry: Consumer Products & Services > Food, Beverage, Tobacco & Cannabis (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

RETVec: Resilient and Efficient Text Vectorizer

Bursztein, Elie, Zhang, Marina, Vallis, Owen, Jia, Xinyu, Kurakin, Alexey

arXiv.org Artificial IntelligenceOct-6-2023

This paper describes RETVec, an efficient, resilient, and multilingual text vectorizer designed for neural-based text processing. RETVec combines a novel character encoding with an optional small embedding model to embed words into a 256-dimensional vector space. The RETVec embedding model is pre-trained using pair-wise metric learning to be robust against typos and character-level adversarial attacks. In this paper, we evaluate and compare RETVec to state-of-the-art vectorizers and word embeddings on popular model architectures and datasets. These comparisons demonstrate that RETVec leads to competitive, multilingual models that are significantly more resilient to typos and adversarial text attacks. RETVec is available under the Apache 2 license at https://github.com/google-research/retvec.

dataset, retvec, vectorizer, (16 more...)

arXiv.org Artificial Intelligence

2302.09207

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(3 more...)

Genre: Research Report (0.64)

Industry:

Information Technology > Security & Privacy (0.68)
Government > Military (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)

Add feedback

Unmasking Falsehoods in Reviews: An Exploration of NLP Techniques

Krishnan, Anusuya Baby Hari

arXiv.org Artificial IntelligenceJul-24-2023

In the contemporary digital landscape, online reviews have become an indispensable tool for promoting products and services across various businesses. Marketers, advertisers, and online businesses have found incentives to create deceptive positive reviews for their products and negative reviews for their competitors' offerings. As a result, the writing of deceptive reviews has become an unavoidable practice for businesses seeking to promote themselves or undermine their rivals. Detecting such deceptive reviews has become an intense and ongoing area of research. This research paper proposes a machine learning model to identify deceptive reviews, with a particular focus on restaurants. This study delves into the performance of numerous experiments conducted on a dataset of restaurant reviews known as the Deceptive Opinion Spam Corpus. To accomplish this, an n-gram model and max features are developed to effectively identify deceptive content, particularly focusing on fake reviews. A benchmark study is undertaken to explore the performance of two different feature extraction techniques, which are then coupled with five distinct machine learning classification algorithms. The experimental results reveal that the passive aggressive classifier stands out among the various algorithms, showcasing the highest accuracy not only in text classification but also in identifying fake reviews. Moreover, the research delves into data augmentation and implements various deep learning techniques to further enhance the process of detecting deceptive reviews. The findings shed light on the efficacy of the proposed machine learning approach and offer valuable insights into dealing with deceptive reviews in the realm of online businesses.

accuracy, artificial intelligence, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2307.10617

Country:

Asia > Middle East > UAE (0.14)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report > New Finding (0.49)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

Building Transformer Models with Attention Crash Course. Build a Neural Machine Translator in 12 Days - MachineLearningMastery.com Building Transformer Models with Attention Crash Course. Build a Neural Machine Translator in 12 Days - MachineLearningMastery.com

#artificialintelligenceJan-6-2023, 00:30:05 GMT

Moreover, when you look at the diagram of the transformer model and your implementation here, you should notice the diagram shows a softmax layer at the output, but we omitted that. The softmax is indeed added in this lesson. Do you see where is it? In the next lesson, you will train this compiled model, on 14 million parameters as we can see in the summary above. Training the transformer depends on everything you created in all previous lessons. Most importantly, the vectorizer and dataset from Lesson 03 must be saved as they will be reused in this and the next lessons. Running this script will take several hours, but once it is finished, you will have the model saved and the loss and accuracy plotted.

artificial intelligence, machine learning, natural language, (18 more...)

#artificialintelligence

Genre: Instructional Material > Course Syllabus & Notes (0.83)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.83)

Add feedback

Interactive Pipeline and Composite Estimators for Your End-to-End ML Model

#artificialintelligenceDec-21-2022, 03:26:12 GMT

Interactive Pipeline and Composite Estimators for Your End-to-End ML Model Machine Learning Modeling posted by ODSC Community November 3, 2022 ODSC Community A data science model development pipeline involves various components including data injection, data preprocessing, feature engineering, feature scaling, and modeling. A data science model development pipeline involves various components including data injection, data preprocessing, feature engineering, feature scaling, and modeling. A data scientist needs to write the learning and inference code for all the components. The code structure sometimes becomes messier and difficult to interpret for other team members, for machine learning projects with heterogeneous data. A pipeline is a very handy function that can sequentially ensemble all your model development components.

interactive pipeline and composite estimator, pipeline, sklearn, (10 more...)

#artificialintelligence

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.31)

Add feedback

Movie Recommendation Engine with NLP - Analytics Vidhya

#artificialintelligenceJan-28-2022, 10:18:22 GMT

So, let us now preprocess our data! Natural Language Processing techniques are our savior when we have to deal with textual data. Since our data cannot be fed to any machine-learning model unless we clean it, that's where NLP comes to play! Let's clean our text data – Firstly, let us create a new column in our dataframe that will hold all necessary keywords required for the model.

analytic vidhya, movie recommendation engine, similarity matrix, (10 more...)

#artificialintelligence

Industry:

Media > Film (0.73)
Leisure & Entertainment (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

Predicting the Difficulty of Texts Using Machine Learning and Getting a Visual Representation of…

#artificialintelligenceDec-29-2021, 17:28:43 GMT

We see that text data is ubiquitous in nature. There is a lot of text present in different forms such as posts, books, articles, and blogs. What is more interesting is the fact that there is a subset of Artificial Intelligence called Natural Language Processing (NLP) that would convert text into a form that could be used for machine learning. I know that sounds a lot but getting to know the details and the proper implementation of machine learning algorithms could ensure that one learns the important tools in the process. Since there are newer and better libraries being created to be used for machine learning purposes, it would make sense to learn some of the state-of-the-art tools that could be used for predictions. I've recently come across a challenge on Kaggle about predicting the difficulty of the text.

library, mathematical vector, visual representation, (13 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.32)

Add feedback