AITopics

2208.1146

Country:

Europe > France > Grand Est > Meurthe-et-Moselle > Nancy (0.05)
Europe > Austria > Upper Austria (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

#artificialintelligenceOct-28-2022, 17:10:12 GMT

Meta AI powers spoken-only language translation

After plans to break physical barriers with his metaverse initiative, Meta CEO Mark Zuckerberg revealed plans for another globe-spanning artificial intelligence (AI) project earlier this year, this time a universal translation tool unlike any other. At the same time, the company that made itself famous (and notorious) for its social media networks also introduced another AI-powered tool, a virtual assistant. Both of these intelligent applications were intended to have practical use cases in Zuckerberg's metaverse, those were their intended uses but they will also have wider business applications that Meta is all too aware of. AI virtual assistants, of course, are already in wider use by organizations as chatbots to handle basic customer requests and interactions across a variety of digital services– including Meta's own popular platforms like Facebook Messenger, Instagram, and WhatsApp Business. The other, less well-known AI use case(s) is the language and translation exercises that provide alternatives to relying on human translators to provide accurate, expert-quality translations in real-time.

ai power spoken-only language translation, meta, translation, (12 more...)

#artificialintelligence

Country:

North America > United States > California (0.05)
Europe > France (0.05)
Asia > Taiwan (0.05)

Genre: Personal > Honors (0.32)

Industry:

Information Technology > Services (0.59)
Government > Regional Government > North America Government > United States Government (0.32)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.78)

arXiv.org Artificial IntelligenceOct-28-2022

DiMBERT: Learning Vision-Language Grounded Representations with Disentangled Multimodal-Attention

Liu, Fenglin, Wu, Xian, Ge, Shen, Ren, Xuancheng, Fan, Wei, Sun, Xu, Zou, Yuexian

Vision-and-language (V-L) tasks require the system to understand both vision content and natural language, thus learning fine-grained joint representations of vision and language (a.k.a. V-L representations) is of paramount importance. Recently, various pre-trained V-L models are proposed to learn V-L representations and achieve improved results in many tasks. However, the mainstream models process both vision and language inputs with the same set of attention matrices. As a result, the generated V-L representations are entangled in one common latent space. To tackle this problem, we propose DiMBERT (short for Disentangled Multimodal-Attention BERT), which is a novel framework that applies separated attention spaces for vision and language, and the representations of multi-modalities can thus be disentangled explicitly. To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format. In this manner, visual concepts help to bridge the gap between the two modalities. We pre-train DiMBERT on a large amount of image-sentence pairs on two tasks: bidirectional language modeling and sequence-to-sequence language modeling. After pre-train, DiMBERT is further fine-tuned for the downstream tasks. Experiments show that DiMBERT sets new state-of-the-art performance on three tasks (over four datasets), including both generation tasks (image captioning and visual storytelling) and classification tasks (referring expressions). The proposed DiM (short for Disentangled Multimodal-Attention) module can be easily incorporated into existing pre-trained V-L models to boost their performance, up to a 5% increase on the representative task. Finally, we conduct a systematic analysis and demonstrate the effectiveness of our DiM and the introduced visual concepts.

artificial intelligence, machine learning, natural language, (14 more...)

2210.16431

Country:

Asia > China > Heilongjiang Province > Daqing (0.04)
Asia > China > Guangdong Province > Shenzhen (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Artificial IntelligenceOct-28-2022

CONQRR: Conversational Query Rewriting for Retrieval with Reinforcement Learning

Wu, Zeqiu, Luan, Yi, Rashkin, Hannah, Reitter, David, Hajishirzi, Hannaneh, Ostendorf, Mari, Tomar, Gaurav Singh

Compared to standard retrieval tasks, passage retrieval for conversational question answering (CQA) poses new challenges in understanding the current user question, as each question needs to be interpreted within the dialogue context. Moreover, it can be expensive to re-train well-established retrievers such as search engines that are originally developed for non-conversational queries. To facilitate their use, we develop a query rewriting model CONQRR that rewrites a conversational question in the context into a standalone question. It is trained with a novel reward function to directly optimize towards retrieval using reinforcement learning and can be adapted to any off-the-shelf retriever. CONQRR achieves state-of-the-art results on a recent open-domain CQA dataset containing conversations from three different sources, and is effective for two different off-the-shelf retrievers. Our extensive analysis also shows the robustness of CONQRR to out-of-domain dialogues as well as to zero query rewriting supervision.

information retrieval, machine learning, reinforcement learning, (21 more...)

2112.08558

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Tennessee > Davidson County > Nashville (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
(8 more...)

Genre: Research Report (0.82)

Industry:

Media (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Information Management > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.88)
(2 more...)

arXiv.org Artificial IntelligenceOct-28-2022

Twist Decoding: Diverse Generators Guide Each Other

Kasai, Jungo, Sakaguchi, Keisuke, Bras, Ronan Le, Peng, Hao, Lu, Ximing, Radev, Dragomir, Choi, Yejin, Smith, Noah A.

Many language generation models are now available for a wide range of generation tasks, including machine translation and summarization. Combining such diverse models may lead to further progress, but ensembling generation models is challenging during inference: conventional ensembling methods (e.g., shallow fusion) require that the models share vocabulary/tokenization schemes. We introduce Twist decoding, a simple and general text generation algorithm that benefits from diverse models at inference time. Our method does not assume the vocabulary, tokenization or even generation order is shared. Our extensive evaluations on machine translation and scientific paper summarization demonstrate that Twist decoding substantially outperforms each model decoded in isolation over various scenarios, including cases where domain-specific and general-purpose models are both available. Twist decoding also consistently outperforms the popular reranking heuristic where output candidates from one model are rescored by another. We hope that our work will encourage researchers and practitioners to examine generation models collectively, not just independently, and to seek out models with complementary strengths to the currently available models. Our code is available at https://github.com/jungokasai/twist_decoding.

machine learning, natural language, translation, (19 more...)

2205.09273

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Tōhoku (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Europe > Spain (0.04)

Genre:

Research Report > Strength High (0.68)
Research Report > Experimental Study (0.68)

Industry: Health & Medicine (0.96)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)

Belay, Tadesse Destaw, Tonja, Atnafu Lambebo, Kolesnikova, Olga, Yimam, Seid Muhie, Ayele, Abinew Ali, Haile, Silesh Bogale, Sidorov, Grigori, Gelbukh, Alexander

The Effect of Normalization for Bi-directional Amharic-English Neural Machine Translation

Machine translation (MT) is one of the main tasks in natural language processing whose objective is to translate texts automatically from one natural language to another. Nowadays, using deep neural networks for MT tasks has received great attention. These networks require lots of data to learn abstract representations of the input and store it in continuous vectors. This paper presents the first relatively large-scale Amharic-English parallel sentence dataset. Using these compiled data, we build bi-directional Amharic-English translation models by fine-tuning the existing Facebook M2M100 pre-trained model achieving a BLEU score of 37.79 in Amharic-English 32.74 in English-Amharic translation. Additionally, we explore the effects of Amharic homophone normalization on the machine translation task. The results show that the normalization of Amharic homophone characters increases the performance of Amharic-English machine translation in both directions.

machine learning, natural language, translation, (18 more...)

2210.15224

Country:

North America > Mexico > Mexico City > Mexico City (0.05)
Africa > Ethiopia > Amhara Region > Bahir Dar (0.05)
North America > United States > Washington > King County > Seattle (0.04)
(8 more...)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Morishita, Makoto, Suzuki, Jun, Nagata, Masaaki

Domain Adaptation of Machine Translation with Crowdworkers

Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain's data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. With the collected parallel data, we can quickly adapt a machine translation model to the target domain. Our experiments show that the proposed method can collect target-domain parallel data over a few days at a reasonable cost. We tested it with five domains, and the domain-adapted model improved the BLEU scores to +19.7 by an average of +7.8 points compared to a general-purpose translation model.

artificial intelligence, natural language, parallel sentence, (14 more...)

2210.15861

Country:

North America > United States (0.14)
Asia > Japan > Honshū > Tōhoku (0.04)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
Asia > India (0.04)

Genre: Research Report (0.50)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Chimoto, Everlyn Asiko, Bassett, Bruce A.

COMET-QE and Active Learning for Low-Resource Machine Translation

Active learning aims to deliver maximum benefit when resources are scarce. We use COMET-QE, a reference-free evaluation metric, to select sentences for low-resource neural machine translation. Using Swahili, Kinyarwanda and Spanish for our experiments, we show that COMET-QE significantly outperforms two variants of Round Trip Translation Likelihood (RTTL) and random sentence selection by up to 5 BLEU points for 20k sentences selected by Active Learning on a 30k baseline. This suggests that COMET-QE is a powerful tool for sentence selection in the very low-resource limit.

machine learning, natural language, translation, (16 more...)

2210.15696

Country:

Asia > China > Hong Kong (0.05)
Africa > South Africa > Western Cape > Cape Town (0.05)
North America > United States > Pennsylvania (0.04)
(8 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Residual Adapters for Few-Shot Text-to-Speech Speaker Adaptation

Morioka, Nobuyuki, Zen, Heiga, Chen, Nanxin, Zhang, Yu, Ding, Yifan

Adapting a neural text-to-speech (TTS) model to a target speaker typically involves fine-tuning most if not all of the parameters of a pretrained multi-speaker backbone model. However, serving hundreds of fine-tuned neural TTS models is expensive as each of them requires significant footprint and separate computational resources (e.g., accelerators, memory). To scale speaker adapted neural TTS voices to hundreds of speakers while preserving the naturalness and speaker similarity, this paper proposes a parameter-efficient few-shot speaker adaptation, where the backbone model is augmented with trainable lightweight modules called residual adapters. This architecture allows the backbone model to be shared across different target speakers. Experimental results show that the proposed approach can achieve competitive naturalness and speaker similarity compared to the full fine-tuning approaches, while requiring only $\sim$0.1% of the backbone model parameters for each speaker.

artificial intelligence, machine learning, natural language, (20 more...)

2210.15868

Country:

North America > United States (0.04)
Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
Asia > Japan > Honshū > Tōhoku > Iwate Prefecture > Morioka (0.04)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.73)
Information Technology > Artificial Intelligence > Vision > Optical Character Recognition (0.63)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.47)
(3 more...)

Smart Speech Segmentation using Acousto-Linguistic Features with look-ahead

Behre, Piyush, Parihar, Naveen, Tan, Sharman, Shah, Amy, Sharma, Eva, Liu, Geoffrey, Chang, Shuangyu, Khalil, Hosam, Basoglu, Chris, Pathak, Sayan

Segmentation for continuous Automatic Speech Recognition (ASR) has traditionally used silence timeouts or voice activity detectors (VADs), which are both limited to acoustic features. This segmentation is often overly aggressive, given that people naturally pause to think as they speak. Consequently, segmentation happens mid-sentence, hindering both punctuation and downstream tasks like machine translation for which high-quality segmentation is critical. Model-based segmentation methods that leverage acoustic features are powerful, but without an understanding of the language itself, these approaches are limited. We present a hybrid approach that leverages both acoustic and language information to improve segmentation. Furthermore, we show that including one word as a look-ahead boosts segmentation quality. On average, our models improve segmentation-F0.5 score by 9.8% over baseline. We show that this approach works for multiple languages. For the downstream task of machine translation, it improves the translation BLEU score by an average of 1.05 points.

artificial intelligence, lm-eos model, natural language, (16 more...)

2210.14446

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)