AITopics | seamless

Collaborating Authors

seamless

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

High-Fidelity Simultaneous Speech-To-Speech Translation

Labiausse, Tom, Mazaré, Laurent, Grave, Edouard, Pérez, Patrick, Défossez, Alexandre, Zeghidour, Neil

arXiv.org Artificial IntelligenceFeb-5-2025

We introduce Hibiki, a decoder-only model for simultaneous speech translation. Hibiki leverages a multistream language model to synchronously process source and target speech, and jointly produces text and audio tokens to perform speech-to-text and speech-to-speech translation. We furthermore address the fundamental challenge of simultaneous interpretation, which unlike its consecutive counterpart, where one waits for the end of the source utterance to start translating, adapts its flow to accumulate just enough context to produce a correct translation in real-time, chunk by chunk. To do so, we introduce a weakly-supervised method that leverages the perplexity of an off-the-shelf text translation system to identify optimal delays on a per-word basis and create aligned synthetic data. After supervised training, Hibiki performs adaptive, simultaneous speech translation with vanilla temperature sampling. On a French-English simultaneous speech translation task, Hibiki demonstrates state-of-the-art performance in translation quality, speaker fidelity and naturalness. Moreover, the simplicity of its inference process makes it compatible with batched translation and even real-time on-device deployment. We provide examples as well as models and inference code.

artificial intelligence, natural language, translation, (18 more...)

arXiv.org Artificial Intelligence

2502.03382

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > Hawaii > Honolulu County > Honolulu (0.04)
North America > Canada > Ontario > Toronto (0.04)
(11 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)

Add feedback

Meta's new AI model can translate speech from more than 100 languages

MIT Technology ReviewJan-15-2025, 16:00:00 GMT

"Meta has done a great job having a breadth of different things they support, like text-to-speech, speech-to-text, even automatic speech recognition," says Chetan Jaiswal, a professor of computer science at Quinnipiac University, who was not involved in the research. "The mere number of languages they are supporting is a tremendous achievement." Human translators are still a vital part of the translation process, the researchers say in the paper, because they can grapple with diverse cultural contexts and make sure the same meaning is conveyed from one language into another. This step is important, says Lynne Bowker, Canada Research Chair in Translation, Technologies and Society at Université Laval in Quebec, who didn't work on Seamless. "Languages are a reflection of cultures, and cultures have their own ways of knowing things," she says.

new ai model, seamless, translate speech, (4 more...)

MIT Technology Review

Country:

North America > Canada > Quebec (0.26)
North America > United States > Virginia (0.06)
North America > United States > Texas (0.06)

Genre: Research Report > New Finding (0.58)

Industry: Health & Medicine > Therapeutic Area > Immunology (0.36)

Technology: Information Technology > Artificial Intelligence > Speech > Speech Recognition (0.94)

Add feedback

Benchmarking GPT-4 against Human Translators: A Comprehensive Evaluation Across Languages, Domains, and Expertise Levels

Yan, Jianhao, Yan, Pingchuan, Chen, Yulong, Li, Jing, Zhu, Xianchao, Zhang, Yue

arXiv.org Artificial IntelligenceNov-20-2024

This study presents a comprehensive evaluation of GPT-4's translation capabilities compared to human translators of varying expertise levels. Through systematic human evaluation using the MQM schema, we assess translations across three language pairs (Chinese$\longleftrightarrow$English, Russian$\longleftrightarrow$English, and Chinese$\longleftrightarrow$Hindi) and three domains (News, Technology, and Biomedical). Our findings reveal that GPT-4 achieves performance comparable to junior-level translators in terms of total errors, while still lagging behind senior translators. Unlike traditional Neural Machine Translation systems, which show significant performance degradation in resource-poor language directions, GPT-4 maintains consistent translation quality across all evaluated language pairs. Through qualitative analysis, we identify distinctive patterns in translation approaches: GPT-4 tends toward overly literal translations and exhibits lexical inconsistency, while human translators sometimes over-interpret context and introduce hallucinations. This study represents the first systematic comparison between LLM and human translators across different proficiency levels, providing valuable insights into the current capabilities and limitations of LLM-based translation systems.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2411.13775

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Asia > China (0.04)
Oceania > Australia > Victoria > Melbourne (0.04)
(9 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Characterizing and Efficiently Accelerating Multimodal Generation Model Inference

Lee, Yejin, Sun, Anna, Hosmer, Basil, Acun, Bilge, Balioglu, Can, Wang, Changhan, Hernandez, Charles David, Puhrsch, Christian, Haziza, Daniel, Guessous, Driss, Massa, Francisco, Kahn, Jacob, Wan, Jeffrey, Reizenstein, Jeremy, Zhai, Jiaqi, Isaacson, Joe, Schlosser, Joel, Pino, Juan, Sadagopan, Kaushik Ram, Shamis, Leonid, Ma, Linjian, Hwang, Min-Jae, Chen, Mingda, Elhoushi, Mostafa, Rodriguez, Pedro, Pasunuru, Ram, Yih, Scott, Popuri, Sravya, Liu, Xing, Wu, Carole-Jean

arXiv.org Artificial IntelligenceSep-30-2024

Generative artificial intelligence (AI) technology is revolutionizing the computing industry. Not only its applications have broadened to various sectors but also poses new system design and optimization opportunities. The technology is capable of understanding and responding in multiple modalities. However, the advanced capability currently comes with significant system resource demands. To sustainably scale generative AI capabilities to billions of users in the world, inference must be fast and efficient. This paper pinpoints key system design and optimization opportunities by characterizing a family of emerging multi-modal generation models on real systems. Auto-regressive token generation is a critical latency performance bottleneck, typically dominated by GPU idle time. In addition to memory-intensive attention across the generative AI models, linear operations constitute significant inference latency due to the feed forward networks in Transformer-based models. We demonstrate that state-of-the-art optimization levers, spanning from applications to system software and hardware, set a 3.88x better baseline.

optimization, seamless, sequence length, (16 more...)

arXiv.org Artificial Intelligence

2410.00215

Country: North America > United States (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (1.00)

Add feedback

CoSTA: Code-Switched Speech Translation using Aligned Speech-Text Interleaving

Shankar, Bhavani, Jyothi, Preethi, Bhattacharyya, Pushpak

arXiv.org Artificial IntelligenceJun-16-2024

Code-switching is a widely prevalent linguistic phenomenon in multilingual societies like India. Building speech-to-text models for code-switched speech is challenging due to limited availability of datasets. In this work, we focus on the problem of spoken translation (ST) of code-switched speech in Indian languages to English text. We present a new end-to-end model architecture COSTA that scaffolds on pretrained automatic speech recognition (ASR) and machine translation (MT) modules (that are more widely available for many languages). Speech and ASR text representations are fused using an aligned interleaving scheme and are fed further as input to a pretrained MT module; the whole pipeline is then trained end-to-end for spoken translation using synthetically created ST data. We also release a new evaluation benchmark for code-switched Bengali-English, Hindi-English, Marathi-English and Telugu- English speech to English text. COSTA significantly outperforms many competitive cascaded and end-to-end multimodal baselines by up to 3.5 BLEU points.

computational linguistic, evaluation, translation, (13 more...)

arXiv.org Artificial Intelligence

2406.10993

Country:

North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
Asia > Philippines > Luzon > National Capital Region > City of Manila (0.14)
North America > Canada > Ontario > Toronto (0.04)
(17 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Uber taking on Seamless with plan to launch UberEats service in 22 new countries

Daily Mail - Science & techSep-27-2016, 18:32:53 GMT

Uber is making an aggressive drive into meal delivery, backed by a wave of staff recruitment, with the U.S. tech heavyweight gearing up to enter at least 22 new countries and take on local rivals. In a measure of rising ambition beyond its taxi business, Uber will begin delivering meals in Amsterdam on Thursday just as Dutch market leader Takeaway.com And according to current job listings on Uber and other recruiting sites - for about 150 roles ranging from general managers and sales staff to bike couriers - UberEats is planning to enter at least 22 new countries across the world in the near future. That is on top of the six countries where it already operates. Uber is making an aggressive drive into meal delivery, backed by a wave of staff recruitment, with the U.S. tech heavyweight gearing up to enter at least 22 new countries and take on local rivals Download the UberEATS app and add your delivery address.

artificial intelligence, uber, ubereat, (17 more...)

Daily Mail - Science & tech

Country:

Europe > Netherlands > North Holland > Amsterdam (0.25)
North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.05)
Europe > United Kingdom (0.05)
(11 more...)

Industry:

Transportation > Passenger (1.00)
Transportation > Ground > Road (1.00)
Information Technology > Services (1.00)

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Domino's DRU pizza delivery robot by the numbers ZDNet

#artificialintelligenceApr-28-2016, 02:46:07 GMT

Last month we heard about DRU, the Domino's delivery robot that's getting a trial run in Australia. The idea may seem silly, but some new restaurant industry numbers highlight the growing importance of food delivery in an age when consumers expect online ordering and rapid to-their-door service. As consumers get more comfortable with autonomous delivery (which is on the way, despite lots of skepticism), a restaurant industry that already uses state of the art logistics services could begin adding delivery robots to their operations in the next decade. According to information provided by 1010data, our hunger for the pies is growing. Domino's, Pizza Hut, and Papa John's combined to account for 45.1% of total food delivery sales, up from 40.3% in 2014.

artificial intelligence, domino, robot, (8 more...)

#artificialintelligence

Country:

Oceania > Australia (0.27)
North America > United States > California (0.07)

Industry: Consumer Products & Services > Restaurants (1.00)

Technology: Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (0.85)

Add feedback