AITopics

2109.06105

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Italy (0.04)
(6 more...)

Genre: Research Report (0.50)

Industry:

Law Enforcement & Public Safety (0.47)
Media > News (0.39)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Rule-Based Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Ferrando, Javier, Costa-jussà, Marta R.

Attention Weights in Transformer NMT Fail Aligning Words Between Sequences but Largely Explain Model Predictions

arXiv.org Artificial IntelligenceSep-13-2021

This work proposes an extensive analysis of the Transformer architecture in the Neural Machine Translation (NMT) setting. Focusing on the encoder-decoder attention mechanism, we prove that attention weights systematically make alignment errors by relying mainly on uninformative tokens from the source sequence. However, we observe that NMT models assign attention to these tokens to regulate the contribution in the prediction of the two contexts, the source and the prefix of the target sequence. We provide evidence about the influence of wrong alignments on the model behavior, demonstrating that the encoder-decoder attention mechanism is well suited as an interpretability method for NMT. Finally, based on our analysis, we propose methods that largely reduce the word alignment error rate compared to standard induced alignments from attention weights.

alignment, computational linguistic, proceedings, (13 more...)

2109.05853

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.15)
Europe > Italy > Tuscany > Florence (0.05)
Africa > Middle East > Egypt > Giza Governorate > Giza (0.05)
(6 more...)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Zhang, Shaolei, Feng, Yang

Modeling Concentrated Cross-Attention for Neural Machine Translation with Gaussian Mixture Model

arXiv.org Artificial IntelligenceSep-13-2021

Cross-attention is an important component of neural machine translation (NMT), which is always realized by dot-product attention in previous methods. However, dot-product attention only considers the pair-wise correlation between words, resulting in dispersion when dealing with long sentences and neglect of source neighboring relationships. Inspired by linguistics, the above issues are caused by ignoring a type of cross-attention, called concentrated attention, which focuses on several central words and then spreads around them. In this work, we apply Gaussian Mixture Model (GMM) to model the concentrated attention in cross-attention. Experiments and analyses we conducted on three datasets show that the proposed method outperforms the baseline and has significant improvement on alignment quality, N-gram accuracy, and long sentence translation.

gaussian distribution, gaussian mixture attention, translation, (14 more...)

2109.05244

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Italy > Tuscany > Florence (0.04)
Europe > Germany > Berlin (0.04)
(11 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

#artificialintelligenceSep-10-2021, 10:04:39 GMT

Association Mining for Machine Learning

Association Rules is one of the very important concepts of machine learning being used in market basket analysis. This course covers the working Principle of Association Mining and its various concepts like Support, Confidence, and Life in a very simplified manner. All of these algorithms has been explained by taking working examples. Parteek Bhatia is Professor in the Department of Computer Science and Engineering and Former Associate Dean of Student Affairs at Thapar Institute of Engineering and Technology, Patiala. At present he is on sabbatical at Tel Aviv University, Israel and acting as Visiting Professor at LAMBDA Lab, TAU.

association mining, machine learning, simplified approach, (4 more...)

#artificialintelligence

Country:

Asia > India (0.44)
Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.27)
Europe > Switzerland > Geneva > Geneva (0.11)
(2 more...)

Genre: Instructional Material > Course Syllabus & Notes (1.00)

Industry:

Education (0.57)
Government (0.41)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.83)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.37)

MURAL: Multimodal, Multitask Retrieval Across Languages

Jain, Aashi, Guo, Mandy, Srinivasan, Krishna, Chen, Ting, Kudugunta, Sneha, Jia, Chao, Yang, Yinfei, Baldridge, Jason

Both image-caption pairs and translation pairs provide the means to learn deep representations of and connections between languages. We use both types of pairs in MURAL (MUltimodal, MUltitask Representations Across Languages), a dual encoder that solves two tasks: 1) image-text matching and 2) translation pair matching. By incorporating billions of translation pairs, MURAL extends ALIGN (Jia et al. PMLR'21)--a state-of-the-art dual encoder learned from 1.8 billion noisy image-text pairs. When using the same encoders, MURAL's performance matches or exceeds ALIGN's cross-modal retrieval performance on well-resourced languages across several datasets. More importantly, it considerably improves performance on under-resourced languages, showing that text-text learning can overcome a paucity of image-caption examples for these languages. On the Wikipedia Image-Text dataset, for example, MURAL-base improves zero-shot mean recall by 8.1% on average for eight under-resourced languages and by 6.8% on average when fine-tuning. We additionally show that MURAL's text representations cluster not only with respect to genealogical connections but also based on areal linguistics, such as the Balkan Sprachbund.

computational linguistic, dataset, proceedings, (15 more...)

2109.05125

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Denmark > Capital Region > Copenhagen (0.04)
Asia > China > Hong Kong (0.04)
(19 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.66)

An Evaluation Dataset and Strategy for Building Robust Multi-turn Response Selection Model

Han, Kijong, Lee, Seojin, Lee, Wooin, Lee, Joosung, Lee, Dong-hun

Multi-turn response selection models have recently shown comparable performance to humans in several benchmark datasets. However, in the real environment, these models often have weaknesses, such as making incorrect predictions based heavily on superficial patterns without a comprehensive understanding of the context. For example, these models often give a high score to the wrong response candidate containing several keywords related to the context but using the inconsistent tense. In this study, we analyze the weaknesses of the open-domain Korean Multi-turn response selection models and publish an adversarial dataset to evaluate these weaknesses. We also suggest a strategy to build a robust model in this adversarial environment.

computational linguistic, dataset, proceedings, (12 more...)

2109.04834

Country: Asia > South Korea (0.04)

Genre: Research Report (0.70)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)

Improving Multilingual Translation by Representation and Gradient Regularization

Yang, Yilin, Eriguchi, Akiko, Muzio, Alexandre, Tadepalli, Prasad, Lee, Stefan, Hassan, Hany

Multilingual Neural Machine Translation (NMT) enables one model to serve all translation directions, including ones that are unseen during training, i.e. zero-shot translation. Despite being theoretically attractive, current models often produce low quality translations -- commonly failing to even produce outputs in the right target language. In this work, we observe that off-target translation is dominant even in strong multilingual systems, trained on massive multilingual corpora. To address this issue, we propose a joint approach to regularize NMT models at both representation-level and gradient-level. At the representation level, we leverage an auxiliary target language prediction task to regularize decoder outputs to retain information about the target language. At the gradient level, we leverage a small amount of direct data (in thousands of sentence pairs) to regularize model gradients. Our results demonstrate that our approach is highly effective in both reducing off-target translation occurrences and improving zero-shot translation performance by +5.59 and +10.38 BLEU on WMT and OPUS datasets respectively. Moreover, experiments show that our method also works well when the small amount of direct data is not available.

arxiv preprint arxiv, oracle data, translation, (14 more...)

2109.04778

Country: North America > United States > Oregon (0.04)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Dev, Sunipa, Monajatipoor, Masoud, Ovalle, Anaelia, Subramonian, Arjun, Phillips, Jeff M, Chang, Kai-Wei

Harms of Gender Exclusivity and Challenges in Non-Binary Representation in Language Technologies

Gender is widely discussed in the context of language tasks and when examining the stereotypes propagated by language models. However, current discussions primarily treat gender as binary, which can perpetuate harms such as the cyclical erasure of non-binary gender identities. These harms are driven by model and dataset biases, which are consequences of the non-recognition and lack of understanding of non-binary genders in society. In this paper, we explain the complexity of gender and language around it, and survey non-binary persons to understand harms associated with the treatment of gender as binary in English language technologies. We also detail how current language representations (e.g., GloVe, BERT) capture and perpetuate these harms and related challenges that need to be acknowledged and addressed for representations to equitably encode gender information.

gender, pronoun, respondent, (16 more...)

2108.12084

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Utah (0.04)
(16 more...)

Genre:

Questionnaire & Opinion Survey (1.00)
Overview (0.67)

Industry:

Health & Medicine (1.00)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.94)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.69)

arXiv.org Artificial IntelligenceSep-9-2021

Speechformer: Reducing Information Loss in Direct Speech Translation

Papi, Sara, Gaido, Marco, Negri, Matteo, Turchi, Marco

Transformer-based models have gained increasing popularity achieving state-of-the-art performance in many research fields including speech translation. However, Transformer's quadratic complexity with respect to the input sequence length prevents its adoption as is with audio signals, which are typically represented by long sequences. Current solutions resort to an initial sub-optimal compression based on a fixed sampling of raw audio features. Therefore, potentially useful linguistic information is not accessible to higher-level layers in the architecture. To solve this issue, we propose Speechformer, an architecture that, thanks to reduced memory usage in the attention layers, avoids the initial lossy compression and aggregates information only at a higher level according to more informed linguistic criteria. Experiments on three language pairs (en->de/es/nl) show the efficacy of our solution, with gains of up to 0.8 BLEU on the standard MuST-C corpus and of up to 4.0 BLEU in a low resource scenario.

artificial intelligence, direct speech translation, natural language, (4 more...)

doi: 10.18653/v1/2021.emnlp-main.127

2109.04574

Genre: Research Report (0.69)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.60)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.60)

Ramnath, Sahana, Johnson, Melvin, Gupta, Abhirut, Raghuveer, Aravindan

HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints

arXiv.org Artificial IntelligenceSep-9-2021

Back-translation (BT) of target monolingual corpora is a widely used data augmentation strategy for neural machine translation (NMT), especially for low-resource language pairs. To improve effectiveness of the available BT data, we introduce HintedBT -- a family of techniques which provides hints (through tags) to the encoder and decoder. First, we propose a novel method of using both high and low quality BT data by providing hints (as source tags on the encoder) to the model about the quality of each source-target pair. We don't filter out low quality data but instead show that these hints enable the model to learn effectively from noisy data. Second, we address the problem of predicting whether a source token needs to be translated or transliterated to the target language, which is common in cross-script translation tasks (i.e., where source and target do not share the written script). For such cases, we propose training the model with additional hints (as target tags on the decoder) that provide information about the operation required on the source (translation or both translation and transliteration). We conduct experiments and detailed analyses on standard WMT benchmarks for three cross-script low/medium-resource language pairs: {Hindi,Gujarati,Tamil}-to-English. Our methods compare favorably with five strong and well established baselines. We show that using these hints, both separately and together, significantly improves translation quality and leads to state-of-the-art performance in all three language pairs in corresponding bilingual settings.

machine translation, proceedings, translation, (12 more...)

2109.04443

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
(13 more...)

Genre: Research Report > Experimental Study (0.93)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)