AITopics | character encoder

Collaborating Authors

character encoder

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

The Strawberry Problem: Emergence of Character-level Understanding in Tokenized Language Models

Cosma, Adrian, Ruseti, Stefan, Radoi, Emilian, Dascalu, Mihai

arXiv.org Artificial IntelligenceSep-17-2025

Despite their remarkable progress across diverse domains, Large Language Models (LLMs) consistently fail at simple character-level tasks, such as counting letters in words, due to a fundamental limitation: tokenization. In this work, we frame this limitation as a problem of low mutual information and analyze it in terms of concept emergence. Using a suite of 19 synthetic tasks that isolate character-level reasoning in a controlled setting, we show that such capabilities emerge suddenly and only late in training. We find that percolation-based models of concept emergence explain these patterns, suggesting that learning character composition is not fundamentally different from learning commonsense knowledge. To address this bottleneck, we propose a lightweight architectural modification that significantly improves character-level reasoning while preserving the inductive advantages of subword models. Together, our results bridge low-level perceptual gaps in tokenized LMs and provide a principled framework for understanding and mitigating their structural blind spots. We make our code publicly available.

large language model, machine learning, natural language, (15 more...)

arXiv.org Artificial Intelligence

2505.14172

Country: North America > United States (0.28)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)

Add feedback

A Method for Training-free Person Image Picture Generation

Chen, Tianyu

arXiv.org Artificial IntelligenceMay-16-2023

The current state-of-the-art Diffusion model has demonstrated excellent results in generating images. However, the images are monotonous and are mostly the result of the distribution of images of people in the training set, making it challenging to generate multiple images for a fixed number of individuals. This problem can often only be solved by fine-tuning the training of the model. This means that each individual/animated character image must be trained if it is to be drawn, and the hardware and cost of this training is often beyond the reach of the average user, who accounts for the largest number of people. To solve this problem, the Character Image Feature Encoder model proposed in this paper enables the user to use the process by simply providing a picture of the character to make the image of the character in the generated image match the expectation. In addition, various details can be adjusted during the process using prompts. Unlike traditional Image-to-Image models, the Character Image Feature Encoder extracts only the relevant image features, rather than information about the model's composition or movements. In addition, the Character Image Feature Encoder can be adapted to different models after training. The proposed model can be conveniently incorporated into the Stable Diffusion generation process without modifying the model's ontology or used in combination with Stable Diffusion as a joint model.

artificial intelligence, character encoder, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2305.09817

Country: Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.96)

Add feedback

Improving Diffusion Models for Scene Text Editing with Dual Encoders

Ji, Jiabao, Zhang, Guanhua, Wang, Zhaowen, Hou, Bairu, Zhang, Zhifei, Price, Brian, Chang, Shiyu

arXiv.org Artificial IntelligenceApr-11-2023

Scene text editing is a challenging task that involves modifying or inserting specified texts in an image while maintaining its natural and realistic appearance. Most previous approaches to this task rely on style-transfer models that crop out text regions and feed them into image transfer models, such as GANs. However, these methods are limited in their ability to change text style and are unable to insert texts into images. Recent advances in diffusion models have shown promise in overcoming these limitations with text-conditional image editing. However, our empirical analysis reveals that state-of-the-art diffusion models struggle with rendering correct text and controlling text style. To address these problems, we propose DIFFSTE to improve pre-trained diffusion models with a dual encoder design, which includes a character encoder for better text legibility and an instruction encoder for better style control. An instruction tuning framework is introduced to train our model to learn the mapping from the text instruction to the corresponding image with either the specified style or the style of the surrounding texts in the background. Such a training method further brings our method the zero-shot generalization ability to the following three scenarios: generating text with unseen font variation, e.g., italic and bold, mixing different fonts to construct a new font, and using more relaxed forms of natural language as the instructions to guide the generation task. We evaluate our approach on five datasets and demonstrate its superior performance in terms of text correctness, image naturalness, and style controllability. Our code is publicly available. https://github.com/UCSB-NLP-Chang/DiffSTE

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2304.05568

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.94)

Add feedback

ZEN 2.0: Continue Training and Adaption for N-gram Enhanced Text Encoders

Song, Yan, Zhang, Tong, Wang, Yonggang, Lee, Kai-Fu

arXiv.org Artificial IntelligenceMay-4-2021

Pre-trained text encoders have drawn sustaining attention in natural language processing (NLP) and shown their capability in obtaining promising results in different tasks. Recent studies illustrated that external self-supervised signals (or knowledge extracted by unsupervised learning, such as n-grams) are beneficial to provide useful semantic evidence for understanding languages such as Chinese, so as to improve the performance on various downstream tasks accordingly. To further enhance the encoders, in this paper, we propose to pre-train n-gram-enhanced encoders with a large volume of data and advanced techniques for training. Moreover, we try to extend the encoder to different languages as well as different domains, where it is confirmed that the same architecture is applicable to these varying circumstances and new state-of-the-art performance is observed from a long list of NLP tasks across languages and domains.

encoder, representation, zen 2, (14 more...)

arXiv.org Artificial Intelligence

2105.01279

Country:

Europe > Spain > Catalonia > Barcelona Province > Barcelona (0.04)
Asia > China > Hong Kong (0.04)
North America > United States (0.04)
(4 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Improved Multi-Stage Training of Online Attention-based Encoder-Decoder Models

Garg, Abhinav, Gowda, Dhananjaya, Kumar, Ankur, Kim, Kwangyoun, Kumar, Mehul, Kim, Chanwoo

arXiv.org Machine LearningDec-27-2019

IMPROVED MUL TI-ST AGE TRAINING OF ONLINE A TTENTION-BASED ENCODER-DECODER MODELS Abhinav Garg, Dhananjaya Gowda, Ankur Kumar, Kwangyoun Kim, Mehul Kumar, Chanwoo Kim Speech Processing Lab, AI Center, Samsung Research, Korea ABSTRACT In this paper, we propose a refined multistage multi-task training strategy to improve the performance of online attention-based encoder-decoder (AED) models. A three-stage training based on three levels of architectural granularity namely, character encoder, byte pair encoding (BPE) based encoder, and attention decoder, is proposed. Also, multi-task learning based on two-levels of linguistic granularity namely, character and BPE, is used. We explore different pre-training strategies for the encoders including transfer learning from a bidirectional encoder. Our models achieve a word error rate (WER) of 5.04% and 4.48% on the Librispeech test-clean data for the smaller and bigger models respectively after fusion with long short-term memory (LSTM) based external language model (LM). Index T erms-- Attention based encoder-decoder models, online attention, multistage training, multi-task learning 1. INTRODUCTION Recently, attention-based encoder-decoder (AED) models have gained popularity for developing end-to-end neural network based automatic speech recognition (ASR) systems [1, 2, 3]. One of the primary advantages of AED models is that the language information is tightly coupled into the decoder, obviating the need for an external language model (LM). AED models have been shown to perform better than other end-to-end models, namely, connectionist temporal classification (CTC) and recurrent neural network transducer (RNN-T) models [4].

character encoder, encoder, ulstm layer, (15 more...)

arXiv.org Machine Learning

1912.12384

Country:

North America > Canada > Quebec > Montreal (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > United States (0.04)
(4 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Training on Synthetic Noise Improves Robustness to Natural Noise in Machine Translation

Karpukhin, Vladimir, Levy, Omer, Eisenstein, Jacob, Ghazvininejad, Marjan

arXiv.org Machine LearningFeb-4-2019

We consider the problem of making machine translation more robust to character-level variation at the source side, such as typos. Existing methods achieve greater coverage by applying subword models such as byte-pair encoding (BPE) and character-level encoders, but these methods are highly sensitive to spelling mistakes. We show how training on a mild amount of random synthetic noise can dramatically improve robustness to these variations, without diminishing performance on clean text. We focus on translation performance on natural noise, as captured by frequent corrections in Wikipedia edit logs, and show that robustness to such noise can be achieved using a balanced diet of simple synthetic noises at training time, without access to the natural noise data or distribution.

natural noise, noise, synthetic noise, (12 more...)

arXiv.org Machine Learning

1902.01509

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Oceania > Australia > Victoria > Melbourne (0.04)
North America > United States > Washington > King County > Seattle (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Resource Mention Extraction for MOOC Discussion Forums

An, Ya-Hui, Pan, Liangming, Kan, Min-Yen, Dong, Qiang, Fu, Yan

arXiv.org Artificial IntelligenceNov-21-2018

In discussions hosted on discussion forums for Massive Online Open Courses (MOOCs), references to online learning resources are often of central importance. However they are usually mentioned in free text, without appropriate hyperlinking to their associated resource. Automated learning resource mention hyperlinking and categorization will facilitate discussion and searching within MOOC forums, and also benefit the contextualization of such resources across disparate views. We propose the novel problem of learning resource mention identification inMOOC forums; i.e., to identify resource mentions in discussions, and classify them into predefined resource types. As this is a novel task with no publicly available data, we first contribute a large-scale labeled dataset - dubbed the Forum Resource Mention (FoRM) dataset - to facilitate our current research and future research on this task. FoRM contains over 10, 000 real-world forum threads in collaboration with Coursera, with more than 23, 000 manually labeled resource mentions. We then formulate this task as a sequence tagging problem and investigate solutionarchitectures to address the problem. Corresponding author Email address: peterpan10211020@gmail.com (Liangming Pan) Preprint submitted to Elsevier November 22, 2018 two major challenges that hinder the application of sequence tagging models tothe task: (1) the diversity of resource mention expression, and (2) long-range contextual dependencies. We address these challenges by incorporating character-leveland thread context information into a LSTM-CRF model. First, we incorporate a character encoder to address the out-ofvocabulary problemcaused by the diversity of mention expressions. Second, to address the context dependency challenge, we encode thread contexts using anRNN-based context encoder, and apply the attention mechanism to selectively leverage useful context information during sequence tagging. Experiments onFoRM show that the proposed method improves the baseline deep sequence tagging models notably, significantly bettering performance on instances that exemplify the two challenges.

artificial intelligence, machine learning, resource mention, (17 more...)

arXiv.org Artificial Intelligence

1811.08853

Country: Asia > Singapore (0.14)

Genre:

Research Report (1.00)
Instructional Material > Online (1.00)
Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education > Educational Technology > Educational Software > Computer Based Training (1.00)
Education > Educational Setting > Online (1.00)

Technology:

Information Technology > Enterprise Applications > Human Resources > Learning Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback