seamless
High-Fidelity Simultaneous Speech-To-Speech Translation
Labiausse, Tom, Mazaré, Laurent, Grave, Edouard, Pérez, Patrick, Défossez, Alexandre, Zeghidour, Neil
We introduce Hibiki, a decoder-only model for simultaneous speech translation. Hibiki leverages a multistream language model to synchronously process source and target speech, and jointly produces text and audio tokens to perform speech-to-text and speech-to-speech translation. We furthermore address the fundamental challenge of simultaneous interpretation, which unlike its consecutive counterpart, where one waits for the end of the source utterance to start translating, adapts its flow to accumulate just enough context to produce a correct translation in real-time, chunk by chunk. To do so, we introduce a weakly-supervised method that leverages the perplexity of an off-the-shelf text translation system to identify optimal delays on a per-word basis and create aligned synthetic data. After supervised training, Hibiki performs adaptive, simultaneous speech translation with vanilla temperature sampling. On a French-English simultaneous speech translation task, Hibiki demonstrates state-of-the-art performance in translation quality, speaker fidelity and naturalness. Moreover, the simplicity of its inference process makes it compatible with batched translation and even real-time on-device deployment. We provide examples as well as models and inference code.
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
- North America > Canada > Ontario > Toronto (0.04)
- (11 more...)
Meta's new AI model can translate speech from more than 100 languages
"Meta has done a great job having a breadth of different things they support, like text-to-speech, speech-to-text, even automatic speech recognition," says Chetan Jaiswal, a professor of computer science at Quinnipiac University, who was not involved in the research. "The mere number of languages they are supporting is a tremendous achievement." Human translators are still a vital part of the translation process, the researchers say in the paper, because they can grapple with diverse cultural contexts and make sure the same meaning is conveyed from one language into another. This step is important, says Lynne Bowker, Canada Research Chair in Translation, Technologies and Society at Université Laval in Quebec, who didn't work on Seamless. "Languages are a reflection of cultures, and cultures have their own ways of knowing things," she says.
- North America > Canada > Quebec (0.26)
- North America > United States > Virginia (0.06)
- North America > United States > Texas (0.06)
Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels
Yan, Jianhao, Yan, Pingchuan, Chen, Yulong, Li, Jing, Zhu, Xianchao, Zhang, Yue
This study presents a comprehensive evaluation of GPT-4's translation capabilities compared to human translators of varying expertise levels. Through systematic human evaluation using the MQM schema, we assess translations across three language pairs (Chinese$\longleftrightarrow$English, Russian$\longleftrightarrow$English, and Chinese$\longleftrightarrow$Hindi) and three domains (News, Technology, and Biomedical). Our findings reveal that GPT-4 achieves performance comparable to junior-level translators in terms of total errors, while still lagging behind senior translators. Unlike traditional Neural Machine Translation systems, which show significant performance degradation in resource-poor language directions, GPT-4 maintains consistent translation quality across all evaluated language pairs. Through qualitative analysis, we identify distinctive patterns in translation approaches: GPT-4 tends toward overly literal translations and exhibits lexical inconsistency, while human translators sometimes over-interpret context and introduce hallucinations. This study represents the first systematic comparison between LLM and human translators across different proficiency levels, providing valuable insights into the current capabilities and limitations of LLM-based translation systems.
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- Asia > China (0.04)
- Oceania > Australia > Victoria > Melbourne (0.04)
- (9 more...)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Lee, Yejin, Sun, Anna, Hosmer, Basil, Acun, Bilge, Balioglu, Can, Wang, Changhan, Hernandez, Charles David, Puhrsch, Christian, Haziza, Daniel, Guessous, Driss, Massa, Francisco, Kahn, Jacob, Wan, Jeffrey, Reizenstein, Jeremy, Zhai, Jiaqi, Isaacson, Joe, Schlosser, Joel, Pino, Juan, Sadagopan, Kaushik Ram, Shamis, Leonid, Ma, Linjian, Hwang, Min-Jae, Chen, Mingda, Elhoushi, Mostafa, Rodriguez, Pedro, Pasunuru, Ram, Yih, Scott, Popuri, Sravya, Liu, Xing, Wu, Carole-Jean
Generative artificial intelligence (AI) technology is revolutionizing the computing industry. Not only its applications have broadened to various sectors but also poses new system design and optimization opportunities. The technology is capable of understanding and responding in multiple modalities. However, the advanced capability currently comes with significant system resource demands. To sustainably scale generative AI capabilities to billions of users in the world, inference must be fast and efficient. This paper pinpoints key system design and optimization opportunities by characterizing a family of emerging multi-modal generation models on real systems. Auto-regressive token generation is a critical latency performance bottleneck, typically dominated by GPU idle time. In addition to memory-intensive attention across the generative AI models, linear operations constitute significant inference latency due to the feed forward networks in Transformer-based models. We demonstrate that state-of-the-art optimization levers, spanning from applications to system software and hardware, set a 3.88x better baseline.
CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving
Shankar, Bhavani, Jyothi, Preethi, Bhattacharyya, Pushpak
Code-switching is a widely prevalent linguistic phenomenon in multilingual societies like India. Building speech-to-text models for code-switched speech is challenging due to limited availability of datasets. In this work, we focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text. We present a new end-to-end model architecture COSTA that scaffolds on pretrained automatic speech recognition (ASR) and machine translation (MT) modules (that are more widely available for many languages). Speech and ASR text representations are fused using an aligned interleaving scheme and are fed further as input to a pretrained MT module; the whole pipeline is then trained end-to-end for spoken translation using synthetically created ST data. We also release a new evaluation benchmark for code-switched Bengali-English, Hindi-English, Marathi-English and Telugu- English speech to English text. COSTA significantly outperforms many competitive cascaded and end-to-end multimodal baselines by up to 3.5 BLEU points.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
- North America > Canada > Ontario > Toronto (0.04)
- (17 more...)
Uber taking on Seamless with plan to launch UberEats service in 22 new countries
Uber is making an aggressive drive into meal delivery, backed by a wave of staff recruitment, with the U.S. tech heavyweight gearing up to enter at least 22 new countries and take on local rivals. In a measure of rising ambition beyond its taxi business, Uber will begin delivering meals in Amsterdam on Thursday just as Dutch market leader Takeaway.com And according to current job listings on Uber and other recruiting sites - for about 150 roles ranging from general managers and sales staff to bike couriers - UberEats is planning to enter at least 22 new countries across the world in the near future. That is on top of the six countries where it already operates. Uber is making an aggressive drive into meal delivery, backed by a wave of staff recruitment, with the U.S. tech heavyweight gearing up to enter at least 22 new countries and take on local rivals Download the UberEATS app and add your delivery address.
- Europe > Netherlands > North Holland > Amsterdam (0.25)
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
- Europe > United Kingdom (0.05)
- (11 more...)
- Transportation > Passenger (1.00)
- Transportation > Ground > Road (1.00)
- Information Technology > Services (1.00)
Domino's DRU pizza delivery robot by the numbers ZDNet
Last month we heard about DRU, the Domino's delivery robot that's getting a trial run in Australia. The idea may seem silly, but some new restaurant industry numbers highlight the growing importance of food delivery in an age when consumers expect online ordering and rapid to-their-door service. As consumers get more comfortable with autonomous delivery (which is on the way, despite lots of skepticism), a restaurant industry that already uses state of the art logistics services could begin adding delivery robots to their operations in the next decade. According to information provided by 1010data, our hunger for the pies is growing. Domino's, Pizza Hut, and Papa John's combined to account for 45.1% of total food delivery sales, up from 40.3% in 2014.
- Oceania > Australia (0.27)
- North America > United States > California (0.07)