AITopics

2210.07095

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
North America > Dominican Republic (0.04)
(8 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.93)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.70)

Binarized Neural Machine Translation

Zhang, Yichi, Garg, Ankush, Cao, Yuan, Lew, Łukasz, Ghorbani, Behrooz, Zhang, Zhiru, Firat, Orhan

The rapid scaling of language models is motivating research using low-bitwidth quantization. In this work, we propose a novel binarization technique for Transformers applied to machine translation (BMT), the first of its kind. We identify and address the problem of inflated dot-product variance when using one-bit weights and activations. Specifically, BMT leverages additional LayerNorms and residual connections to improve binarization quality. Experiments on the WMT dataset show that a one-bit weight-only Transformer can achieve the same quality as a float one, while being 16x smaller in size. One-bit activations incur varying degrees of quality drop, but mitigated by the proposed architectural changes. We further conduct a scaling law study using production-scale translation datasets, which shows that one-bit weight Transformers scale and generalize well in both in-domain and out-of-domain settings. Implementation in JAX/Flax will be open sourced.

artificial intelligence, machine learning, natural language, (15 more...)

2302.04907

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Europe > Spain (0.04)
(3 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Carrión, Salvador, Casacuberta, Francisco

AutoNMT: A Framework to Streamline the Research of Seq2Seq Models

We present AutoNMT, a framework to streamline the research of seq-to-seq models by automating the data pipeline (i.e., file management, data preprocessing, and exploratory analysis), automating experimentation in a toolkit-agnostic manner, which allows users to use either their own models or existing seq-to-seq toolkits such as Fairseq or OpenNMT, and finally, automating the report generation (plots and summaries). Furthermore, this library comes with its own seq-to-seq toolkit so that users can easily customize it for non-standard tasks.

artificial intelligence, machine learning, natural language, (17 more...)

2302.04981

Country:

North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Belgium > Brussels-Capital Region > Brussels (0.04)
Asia > Middle East > Qatar > Ad-Dawhah > Doha (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.31)

Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study

Ye, Hai, Ding, Yuyang, Li, Juntao, Ng, Hwee Tou

A deployed question answering (QA) model can easily fail when the test data has a distribution shift compared to the training data. Robustness tuning (RT) methods have been widely studied to enhance model robustness against distribution shifts before model deployment. However, can we improve a model after deployment? To answer this question, we evaluate test-time adaptation (TTA) to improve a model after deployment. We first introduce COLDQA, a unified evaluation benchmark for robust QA against text corruption and changes in language and domain. We then evaluate previous TTA methods on COLDQA and compare them to RT methods. We also propose a novel TTA method called online imitation learning (OIL). Through extensive experiments, we find that TTA is comparable to RT methods, and applying TTA after RT can significantly boost the performance on COLDQA. Our proposed OIL improves TTA to be more robust to variation in hyper-parameters and test distributions over time.

machine learning, natural language, question answering, (19 more...)

2302.04618

Country:

Asia > Singapore (0.04)
Europe > Netherlands > South Holland > Delft (0.04)
Asia > China (0.04)

Genre:

Research Report (0.50)
Instructional Material (0.34)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Question Answering (0.71)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Efficient Attention via Control Variates

Zheng, Lin, Yuan, Jianbo, Wang, Chong, Kong, Lingpeng

Random-feature-based attention (RFA) is an efficient approximation of softmax attention with linear runtime and space complexity. However, the approximation gap between RFA and conventional softmax attention is not well studied. Built upon previous progress of RFA, we characterize this gap through the lens of control variates and show that RFA can be decomposed into a sum of multiple control variate estimators for each element in the sequence. This new framework reveals that exact softmax attention can be recovered from RFA by manipulating each control variate. Besides, it allows us to develop a more flexible form of control variates, resulting in a novel attention mechanism that significantly reduces the approximation gap while maintaining linear complexity. Extensive experiments demonstrate that our model outperforms state-of-the-art efficient attention mechanisms on both vision and language tasks.

artificial intelligence, machine learning, natural language, (17 more...)

2302.04542

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
North America > Dominican Republic (0.04)
(5 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Xu, Haoran, Maillard, Jean, Goswami, Vedanuj

Language-Aware Multilingual Machine Translation with Self-Supervised Learning

Multilingual machine translation (MMT) benefits from cross-lingual transfer but is a challenging multitask optimization problem. This is partly because there is no clear framework to systematically learn language-specific parameters. Self-supervised learning (SSL) approaches that leverage large quantities of monolingual data (where parallel data is unavailable) have shown promise by improving translation performance as complementary tasks to the MMT task. However, jointly optimizing SSL and MMT tasks is even more challenging. In this work, we first investigate how to utilize intra-distillation to learn more *language-specific* parameters and then show the importance of these language-specific parameters. Next, we propose a novel but simple SSL task, concurrent denoising, that co-trains with the MMT task by concurrently denoising monolingual data on both the encoder and decoder. Finally, we apply intra-distillation to this co-training approach. Combining these two approaches significantly improves MMT performance, outperforming three state-of-the-art SSL methods by a large margin, e.g., 11.3\% and 3.7\% improvement on an 8-language and a 15-language benchmark compared with MASS, respectively

artificial intelligence, machine learning, natural language, (17 more...)

2302.05008

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > Dominican Republic (0.04)
Europe > Latvia > Lubāna Municipality > Lubāna (0.04)
(3 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Toolformer: Language Models Can Teach Themselves to Use Tools

Schick, Timo, Dwivedi-Yu, Jane, Dessì, Roberto, Raileanu, Roberta, Lomeli, Maria, Zettlemoyer, Luke, Cancedda, Nicola, Scialom, Thomas

Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.

large language model, machine learning, natural language, (19 more...)

2302.04761

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Africa > Ghana (0.04)
North America > United States > Pennsylvania > Lackawanna County > Scranton (0.04)
(14 more...)

Genre: Research Report (0.82)

Industry:

Leisure & Entertainment (0.93)
Government > Regional Government > North America Government > United States Government (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.92)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.31)

Lawrie, Dawn, Yang, Eugene, Oard, Douglas W., Mayfield, James

Neural Approaches to Multilingual Information Retrieval

Providing access to information across languages has been a goal of Information Retrieval (IR) for decades. While progress has been made on Cross Language IR (CLIR) where queries are expressed in one language and documents in another, the multilingual (MLIR) task to create a single ranked list of documents across many languages is considerably more challenging. This paper investigates whether advances in neural document translation and pretrained multilingual neural language models enable improvements in the state of the art over earlier MLIR techniques. The results show that although combining neural document translation with neural ranking yields the best Mean Average Precision (MAP), 98% of that MAP score can be achieved with an 84% reduction in indexing time by using a pretrained XLM-R multilingual language model to index documents in their native language, and that 2% difference in effectiveness is not statistically significant. Key to achieving these results for MLIR is to fine-tune XLM-R using mixed-language batches from neural translations of MS MARCO passages.

information retrieval, machine learning, natural language, (15 more...)

2209.01335

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland > Prince George's County > College Park (0.14)
North America > Dominican Republic (0.04)
(3 more...)

Genre:

Research Report > Experimental Study (0.88)
Research Report > New Finding (0.88)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.72)

EngadgetFeb-8-2023, 18:48:40 GMT

Google Translate should soon offer better suggestions for words with multiple meanings

Google Translate is getting an AI-powered upgrade in the coming weeks to help you find more accurate translations, particularly for words with multiple definitions. The app will offer additional contextual translation options with descriptions and examples. Let's say you're looking for a translation of the word "row," which has multiple meanings in English. It could refer to an argument, a line of seats on a plane or using an oar to propel a boat. Google Translate should soon offer translations for all of those variants, along with examples of how they're used.

google translate, offer better suggestion, translation

Engadget

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Kano, Yasumasa, Sudoh, Katsuhito, Nakamura, Satoshi

Average Token Delay: A Latency Metric for Simultaneous Translation

arXiv.org Artificial IntelligenceFeb-8-2023

Simultaneous translation is a task in which translation begins before the speaker has finished speaking. In its evaluation, we have to consider the latency of the translation in addition to the quality. The latency is preferably as small as possible for users to comprehend what the speaker says with a small delay. Existing latency metrics focus on when the translation starts but do not consider adequately when the translation ends. This means such metrics do not penalize the latency caused by a long translation output, which actually delays users' comprehension. In this work, we propose a novel latency evaluation metric called Average Token Delay (ATD) that focuses on the end timings of partial translations in simultaneous translation. We discuss the advantage of ATD using simulated examples and also investigate the differences between ATD and Average Lagging with simultaneous translation experiments.

artificial intelligence, natural language, translation, (14 more...)

2211.13173

Country:

North America > United States > Pennsylvania (0.04)
North America > Dominican Republic (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)
(3 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Speech (0.94)