AITopics

2302.00321

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > California > San Diego County > San Diego (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(17 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

arXiv.org Artificial IntelligenceFeb-1-2023

Attention Link: An Efficient Attention-Based Low Resource Machine Translation Architecture

Min, Zeping

Transformers have achieved great success in machine translation, but transformer-based NMT models often require millions of bilingual parallel corpus for training. In this paper, we propose a novel architecture named as attention link (AL) to help improve transformer models' performance, especially in low training resources. We theoretically demonstrate the superiority of our attention link architecture in low training resources. Besides, we have done a large number of experiments, including en-de, de-en, en-fr, en-it, it-en, en-ro translation tasks on the IWSLT14 dataset as well as real low resources scene on bn-gu and gu-ta translation tasks on the CVIT PIB dataset. All the experiment results show our attention link is powerful and can lead to a significant improvement. In addition, we achieve a 37.9 BLEU score, a new sota, on the IWSLT14 de-en task by combining our attention link and other advanced methods.

attention link, machine learning, natural language, (14 more...)

2302.0034

Country:

Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Europe > Germany > North Rhine-Westphalia > Cologne Region > Aachen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Eronen, Juuso, Ptaszynski, Michal, Masui, Fumito

Zero-shot cross-lingual transfer language selection using linguistic similarity

We study the selection of transfer languages for different Natural Language Processing tasks, specifically sentiment analysis, named entity recognition and dependency parsing. In order to select an optimal transfer language, we propose to utilize different linguistic similarity metrics to measure the distance between languages and make the choice of transfer language based on this information instead of relying on intuition. We demonstrate that linguistic similarity correlates with cross-lingual transfer performance for all of the proposed tasks. We also show that there is a statistically significant difference in choosing the optimal language as the transfer source instead of English. This allows us to select a more suitable transfer language which can be used to better leverage knowledge from high-resource languages in order to improve the performance of language applications lacking data. For the study, we used datasets from eight different languages from three language families.

computational linguistic, machine learning, natural language, (15 more...)

doi: 10.1016/j.ipm.2022.103250

2301.1372

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Hong Kong (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(18 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)

Zhang, Bryan, Misra, Amita

Machine Translation Impact in E-commerce Multilingual Search

Previous work suggests that performance of cross-lingual information retrieval correlates highly with the quality of Machine Translation. However, there may be a threshold beyond which improving query translation quality yields little or no benefit to further improve the retrieval performance. This threshold may depend upon multiple factors including the source and target languages, the existing MT system quality and the search pipeline. In order to identify the benefit of improving an MT system for a given search pipeline, we investigate the sensitivity of retrieval quality to the presence of different levels of MT quality using experimental datasets collected from actual traffic. We systematically improve the performance of our MT systems quality on language pairs as measured by MT evaluation metrics including Bleu and Chrf to determine their impact on search precision metrics and extract signals that help to guide the improvement strategies. Using this information we develop techniques to compare query translations for multiple language pairs and identify the most promising language pairs to invest and improve.

artificial intelligence, machine translation, natural language, (15 more...)

2302.00119

Country:

North America > United States > New York > New York County > New York City (0.05)
Europe > Germany (0.05)
Europe > France > Provence-Alpes-Côte d'Azur > Bouches-du-Rhône > Marseille (0.04)
(6 more...)

Genre: Research Report (0.65)

Industry: Information Technology > Services > e-Commerce Services (0.44)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Bayes risk CTC: Controllable CTC alignment in Sequence-to-Sequence tasks

Tian, Jinchuan, Yan, Brian, Yu, Jianwei, Weng, Chao, Yu, Dong, Watanabe, Shinji

Sequence-to-Sequence (seq2seq) tasks transcribe the input sequence to a target sequence. The Connectionist Temporal Classification (CTC) criterion is widely used in multiple seq2seq tasks. Besides predicting the target sequence, a side product of CTC is to predict the alignment, which is the most probable input-long sequence that specifies a hard aligning relationship between the input and target units. As there are multiple potential aligning sequences (called paths) that are equally considered in CTC formulation, the choice of which path will be most probable and become the predicted alignment is always uncertain. In addition, it is usually observed that the alignment predicted by vanilla CTC will drift compared with its reference and rarely provides practical functionalities. Thus, the motivation of this work is to make the CTC alignment prediction controllable and thus equip CTC with extra functionalities. The Bayes risk CTC (BRCTC) criterion is then proposed in this work, in which a customizable Bayes risk function is adopted to enforce the desired characteristics of the predicted alignment. With the risk function, the BRCTC is a general framework to adopt some customizable preference over the paths in order to concentrate the posterior into a particular subset of the paths. In applications, we explore one particular preference which yields models with the down-sampling ability and reduced inference costs. By using BRCTC with another preference for early emissions, we obtain an improved performance-latency trade-off for online models. Each path suggests a hard alignment between the input and target. Different colors mean different units. All non-blank spikes are squeezed to the earlier time stamps. Sequence-to-Sequence (seq2seq) tasks have attracted broad interest and achieved great progress in multiple applications in the past few decades.

artificial intelligence, machine learning, natural language, (17 more...)

2210.07499

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(2 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
(2 more...)

Difformer: Empowering Diffusion Models on the Embedding Space for Text Generation

Gao, Zhujin, Guo, Junliang, Tan, Xu, Zhu, Yongxin, Zhang, Fang, Bian, Jiang, Xu, Linli

Diffusion models have achieved state-of-the-art synthesis quality on both visual and audio tasks, and recent works further adapt them to textual data by diffusing on the embedding space. In this paper, we conduct systematic studies and analyze the challenges between the continuous data space and the embedding space which have not been carefully explored. Firstly, the data distribution is learnable for embeddings, which may lead to the collapse of the loss function. Secondly, as the norm of embeddings varies between popular and rare words, adding the same noise scale will lead to sub-optimal results. In addition, we find the normal level of noise causes insufficient training of the model. To address the above challenges, we propose Difformer, an embedding diffusion model based on Transformer, which consists of three essential modules including an additional anchor loss function, a layer normalization module for embeddings, and a noise factor to the Gaussian noise. Experiments on two seminal text generation tasks including machine translation and text summarization show the superiority of Difformer over compared embedding diffusion baselines.

artificial intelligence, machine learning, natural language, (16 more...)

2212.09412

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Baltimore (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Lin, Xiang, Jwalapuram, Prathyusha, Joty, Shafiq

Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation

State-of-the-art neural text generation models are typically trained to maximize the likelihood of each token in the ground-truth sequence conditioned on the previous target tokens. However, during inference, the model needs to make a prediction conditioned on the tokens generated by itself. This train-test discrepancy is referred to as exposure bias. Scheduled sampling is a curriculum learning strategy that gradually exposes the model to its own predictions during training to mitigate this bias. Most of the proposed approaches design a scheduler based on training steps, which generally requires careful tuning depending on the training setup. In this work, we introduce Dynamic Scheduled Sampling with Imitation Loss (DySI), which maintains the schedule based solely on the training time accuracy, while enhancing the curriculum learning by introducing an imitation loss, which attempts to make the behavior of the decoder indistinguishable from the behavior of a teacher-forced decoder. DySI is universally applicable across training setups with minimal tuning. Extensive experiments and analysis show that DySI not only achieves notable improvements on standard machine translation benchmarks, but also significantly improves the robustness of other text generation models.

artificial intelligence, machine learning, natural language, (17 more...)

2301.13753

Country:

North America > Dominican Republic (0.04)
Asia > Singapore (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(23 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

arXiv.org Artificial IntelligenceJan-30-2023

KG-BERTScore: Incorporating Knowledge Graph into BERTScore for Reference-Free Machine Translation Evaluation

Wu, Zhanglin, Zhang, Min, Zhu, Ming, Li, Yinglu, Zhu, Ting, Yang, Hao, Peng, Song, Qin, Ying

BERTScore is an effective and robust automatic metric for referencebased machine translation evaluation. In this paper, we incorporate multilingual knowledge graph into BERTScore and propose a metric named KG-BERTScore, which linearly combines the results of BERTScore and bilingual named entity matching for reference-free machine translation evaluation. From the experimental results on WMT19 QE as a metric without references shared tasks, our metric KG-BERTScore gets higher overall correlation with human judgements than the current state-of-the-art metrics for reference-free machine translation evaluation.1 Moreover, the pre-trained multilingual model used by KG-BERTScore and the parameter for linear combination are also studied in this paper.

artificial intelligence, machine translation, natural language, (12 more...)

2301.12699

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceJan-30-2023

PCC: Paraphrasing with Bottom-k Sampling and Cyclic Learning for Curriculum Data Augmentation

Lu, Hongyuan, Lam, Wai

Curriculum Data Augmentation (CDA) improves neural models by presenting synthetic data with increasing difficulties from easy to hard. However, traditional CDA simply treats the ratio of word perturbation as the difficulty measure and goes through the curriculums only once. This paper presents \textbf{PCC}: \textbf{P}araphrasing with Bottom-k Sampling and \textbf{C}yclic Learning for \textbf{C}urriculum Data Augmentation, a novel CDA framework via paraphrasing, which exploits the textual paraphrase similarity as the curriculum difficulty measure. We propose a curriculum-aware paraphrase generation module composed of three units: a paraphrase candidate generator with bottom-k sampling, a filtering mechanism and a difficulty measure. We also propose a cyclic learning strategy that passes through the curriculums multiple times. The bottom-k sampling is proposed to generate super-hard instances for the later curriculums. Experimental results on few-shot text classification as well as dialogue generation indicate that PCC surpasses competitive baselines. Human evaluation and extensive case studies indicate that bottom-k sampling effectively generates super-hard instances, and PCC significantly improves the baseline dialogue agent.

curriculum, machine learning, natural language, (16 more...)

2208.0811

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Cuba (0.04)
Asia > China > Hong Kong (0.04)
(9 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.47)

arXiv.org Artificial IntelligenceJan-29-2023

BERT-based Authorship Attribution on the Romanian Dataset called ROST

Avram, Sanda-Maria

Being around for decades, the problem of Authorship Attribution is still very much in focus currently. Some of the more recent instruments used are the pre-trained language models, the most prevalent being BERT. Here we used such a model to detect the authorship of texts written in the Romanian language. The dataset used is highly unbalanced, i.e., significant differences in the number of texts per author, the sources from which the texts were collected, the time period in which the authors lived and wrote these texts, the medium intended to be read (i.e., paper or online), and the type of writing (i.e., stories, short stories, fairy tales, novels, literary articles, and sketches). The results are better than expected, sometimes exceeding 87\% macro-accuracy.

artificial intelligence, machine learning, natural language, (17 more...)

2301.125

Country:

North America > Dominican Republic (0.04)
Europe > France (0.04)
Asia > India > Bihar > Patna (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)