AITopics

2312.0282

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > China > Guangdong Province > Shenzhen (0.05)
Europe > Ireland > Leinster > County Dublin > Dublin (0.05)
(10 more...)

Genre:

Research Report > Promising Solution (0.48)
Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceDec-5-2023

AV2AV: Direct Audio-Visual Speech to Audio-Visual Speech Translation with Unified Audio-Visual Speech Representation

Choi, Jeongsoo, Park, Se Jin, Kim, Minsu, Ro, Yong Man

This paper proposes a novel direct Audio-Visual Speech to Audio-Visual Speech Translation (AV2AV) framework, where the input and output of the system are multimodal (i.e., audio and visual speech). With the proposed AV2AV, two key advantages can be brought: 1) We can perform real-like conversations with individuals worldwide in a virtual meeting by utilizing our own primary languages. In contrast to Speech-to-Speech Translation (A2A), which solely translates between audio modalities, the proposed AV2AV directly translates between audio-visual speech. This capability enhances the dialogue experience by presenting synchronized lip movements along with the translated speech. 2) We can improve the robustness of the spoken language translation system. By employing the complementary information of audio-visual speech, the system can effectively translate spoken language even in the presence of acoustic noise, showcasing robust performance. To mitigate the problem of the absence of a parallel AV2AV translation dataset, we propose to train our spoken language translation system with the audio-only dataset of A2A. This is done by learning unified audio-visual speech representations through self-supervised learning in advance to train the translation system. Moreover, we propose an AV-Renderer that can generate raw audio and video in parallel. It is designed with zero-shot speaker modeling, thus the speaker in source audio-visual speech can be maintained at the target translated audio-visual speech. The effectiveness of AV2AV is evaluated with extensive experiments in a many-to-many language translation setting. The demo page is available on https://choijeongsoo.github.io/av2av.

dataset, speech, translation, (13 more...)

2312.02512

Country:

Europe > Spain (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Taiwan > Taiwan Province > Taipei (0.04)
(2 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

EngadgetDec-4-2023, 03:50:14 GMT

Meta's latest AI suite makes speech translation more seamless and expressive

Back in August, Meta unveiled its multimodal AI translation model, SeamlessM4T, which supports nearly 100 languages for text and 36 for speech. With an updated "v2" architecture, the tech giant is now expanding on this tool to make conversational translations more spontaneous and expressive -- the latter a missing key to an authentic conversation across languages. The first of the two new features is "SeamlessExpressive" which, as you can tell by the name, ports your expressions over to your translated speech. These include your pitch, volume, emotional tone (excitement, sadness or whispers), speech rate and pauses. Considering how translated speeches had always sounded robotic until now, this breakthrough is potentially a game-changer -- both in our daily lives and also in content production.

ai suite make speech translation, meta, translation, (1 more...)

Engadget

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.40)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.40)

Havaldar, Shreya, Pressimone, Matthew, Wong, Eric, Ungar, Lyle

Comparing Styles across Languages

arXiv.org Artificial IntelligenceDec-4-2023

Understanding how styles differ across languages is advantageous for training both humans and computers to generate culturally appropriate text. We introduce an explanation framework to extract stylistic differences from multilingual LMs and compare styles across languages. Our framework (1) generates comprehensive style lexica in any language and (2) consolidates feature importances from LMs into comparable lexical categories. We apply this framework to compare politeness, creating the first holistic multilingual politeness dataset and exploring how politeness varies across four languages. Our approach enables an effective evaluation of how distinct linguistic categories contribute to stylistic variations and provides interpretable insights into how people communicate differently around the world.

category, politeness, utterance, (14 more...)

2310.07135

Country:

Asia > Japan (0.04)
North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > Pennsylvania (0.04)
(9 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications > Social Media (0.93)
(2 more...)

Stap, David, Niculae, Vlad, Monz, Christof

Viewing Knowledge Transfer in Multilingual Machine Translation Through a Representational Lens

arXiv.org Artificial IntelligenceDec-4-2023

We argue that translation quality alone is not a sufficient metric for measuring knowledge transfer in multilingual neural machine translation. To support this claim, we introduce Representational Transfer Potential (RTP), which measures representational similarities between languages. We show that RTP can measure both positive and negative transfer (interference), and find that RTP is strongly correlated with changes in translation quality, indicating that transfer does occur. Furthermore, we investigate data and language characteristics that are relevant for transfer, and find that multi-parallel overlap is an important yet under-explored feature. Based on this, we develop a novel training scheme, which uses an auxiliary similarity loss that encourages representations to be more invariant across languages by taking advantage of multi-parallel data. We show that our method yields increased translation quality for low- and mid-resource languages across multiple data and model setups.

computational linguistic, similarity, translation quality, (14 more...)

2305.1155

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > Washington > King County > Seattle (0.04)
(13 more...)

Genre: Research Report > New Finding (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

arXiv.org Artificial IntelligenceDec-4-2023

VideoDubber: Machine Translation with Speech-Aware Length Control for Video Dubbing

Wu, Yihan, Guo, Junliang, Tan, Xu, Zhang, Chen, Li, Bohan, Song, Ruihua, He, Lei, Zhao, Sheng, Menezes, Arul, Bian, Jiang

Video dubbing aims to translate the original speech in a film or television program into the speech in a target language, which can be achieved with a cascaded system consisting of speech recognition, machine translation and speech synthesis. To ensure the translated speech to be well aligned with the corresponding video, the length/duration of the translated speech should be as close as possible to that of the original speech, which requires strict length control. Previous works usually control the number of words or characters generated by the machine translation model to be similar to the source sentence, without considering the isochronicity of speech as the speech duration of words/characters in different languages varies. In this paper, we propose a machine translation system tailored for the task of video dubbing, which directly considers the speech duration of each token in translation, to match the length of source and target speech. Specifically, we control the speech length of generated sentence by guiding the prediction of each word with the duration information, including the speech duration of itself as well as how much duration is left for the remaining words. We design experiments on four language directions (German -> English, Spanish -> English, Chinese <-> English), and the results show that the proposed method achieves better length control ability on the generated speech than baseline methods. To make up the lack of real-world datasets, we also construct a real-world test set collected from films to provide comprehensive evaluations on the video dubbing task.

length control, speech, video, (15 more...)

2211.16934

Country:

North America > Canada > Quebec > Montreal (0.04)
Asia > South Korea > Incheon > Incheon (0.04)
Asia > China (0.04)
(3 more...)

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Briakou, Eleftheria, Goyal, Navita, Carpuat, Marine

Explaining with Contrastive Phrasal Highlighting: A Case Study in Assisting Humans to Detect Translation Differences

Explainable NLP techniques primarily explain by answering "Which tokens in the input are responsible for this prediction?''. We argue that for NLP models that make predictions by comparing two input texts, it is more useful to explain by answering "What differences between the two inputs explain this prediction?''. We introduce a technique to generate contrastive highlights that explain the predictions of a semantic divergence model via phrase-alignment-guided erasure. We show that the resulting highlights match human rationales of cross-lingual semantic differences better than popular post-hoc saliency techniques and that they successfully help people detect fine-grained meaning differences in human translations and critical machine translation errors.

computational linguistic, explanation, proceedings, (13 more...)

2312.01582

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
North America > United States > Maryland (0.04)
Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.04)
(17 more...)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Kanumolu, Gopichand, Madasu, Lokesh, Baswani, Pavan, Mukherjee, Ananya, Shrivastava, Manish

Unsupervised Approach to Evaluate Sentence-Level Fluency: Do We Really Need Reference?

Fluency is a crucial goal of all Natural Language Generation (NLG) systems. Widely used automatic evaluation metrics fall short in capturing the fluency of machine-generated text. Assessing the fluency of NLG systems poses a challenge since these models are not limited to simply reusing words from the input but may also generate abstractions. Existing reference-based fluency evaluations, such as word overlap measures, often exhibit weak correlations with human judgments. This paper adapts an existing unsupervised technique for measuring text fluency without the need for any reference. Our approach leverages various word embeddings and trains language models using Recurrent Neural Network (RNN) architectures. We also experiment with other available multilingual Language Models (LMs). To assess the performance of the models, we conduct a comparative analysis across 10 Indic languages, correlating the obtained fluency scores with human judgments. Our code and human-annotated benchmark test-set for fluency is available at https://github.com/AnanyaCoder/TextFluencyForIndicLanaguges.

evaluation, fluency, fluency score, (14 more...)

2312.015

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > India > Maharashtra > Mumbai (0.04)
North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Generation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.76)

Ajlouni, AlMotasem Bellah Al, Li, Jinlong

CEScore: Simple and Efficient Confidence Estimation Model for Evaluating Split and Rephrase

The split and rephrase (SR) task aims to divide a long, complex sentence into a set of shorter, simpler sentences that convey the same meaning. This challenging problem in NLP has gained increased attention recently because of its benefits as a pre-processing step in other NLP tasks. Evaluating quality of SR is challenging, as there no automatic metric fit to evaluate this task. In this work, we introduce CEScore, as novel statistical model to automatically evaluate SR task. By mimicking the way humans evaluate SR, CEScore provides 4 metrics (Sscore, Gscore, Mscore, and CEscore) to assess simplicity, grammaticality, meaning preservation, and overall quality, respectively. In experiments with 26 models, CEScore correlates strongly with human evaluations, achieving 0.98 in Spearman correlations at model-level. This underscores the potential of CEScore as a simple and effective metric for assessing the overall quality of SR models.

evaluation, simplicity, simplification, (16 more...)

2312.01356

Country:

Asia > China > Anhui Province > Hefei (0.04)
Europe > Belgium (0.04)
Asia > India > Karnataka > Bengaluru (0.04)
Africa > Kenya > Mandera County > Mandera (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.94)

Han, HyoJung, Boyd-Graber, Jordan Lee, Carpuat, Marine

Bridging Background Knowledge Gaps in Translation with Automatic Explicitation

Translations help people understand content written in another language. However, even correct literal translations do not fulfill that goal when people lack the necessary background to understand them. Professional translators incorporate explicitations to explain the missing context by considering cultural differences between source and target audiences. Despite its potential to help users, NLP research on explicitation is limited because of the dearth of adequate evaluation methods. This work introduces techniques for automatically generating explicitations, motivated by WikiExpl: a dataset that we collect from Wikipedia and annotate with human translators. The resulting explicitations are useful as they help answer questions more accurately in a multilingual question answering framework.

computational linguistic, explicitation, translation, (14 more...)

2312.01308

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Europe > France (0.05)
Asia > Middle East > Jordan (0.04)
(26 more...)

Genre: Research Report (0.50)

Industry:

Media (0.67)
Government (0.46)

Technology: Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.93)