AITopics | Bar, Nadav

Collaborating Authors

Bar, Nadav

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Zero-Shot Mono-to-Binaural Speech Synthesis

Levkovitch, Alon, Salazar, Julian, Mariooryad, Soroosh, Skerry-Ryan, RJ, Bar, Nadav, Kleijn, Bastiaan, Nachmani, Eliya

arXiv.org Artificial IntelligenceDec-11-2024

We present ZeroBAS, a neural method to synthesize binaural audio from monaural audio recordings and positional information without training on any binaural data. To our knowledge, this is the first published zero-shot neural approach to mono-to-binaural audio synthesis. Specifically, we show that a parameter-free geometric time warping and amplitude scaling based on source location suffices to get an initial binaural synthesis that can be refined by iteratively applying a pretrained denoising vocoder. Furthermore, we find this leads to generalization across room conditions, which we measure by introducing a new dataset, TUT Mono-to-Binaural, to evaluate state-of-the-art monaural-to-binaural synthesis methods on unseen conditions. Our zero-shot method is perceptually on-par with the performance of supervised methods on the standard mono-to-binaural dataset, and even surpasses them on our out-of-distribution TUT Mono-to-Binaural dataset. Our results highlight the potential of pretrained generative audio models and zero-shot learning to unlock robust binaural audio synthesis.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2412.08356

Country:

Europe (0.93)
North America > United States (0.29)
North America > Canada (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Media (0.34)

Technology:

Information Technology > Artificial Intelligence > Speech (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Vision (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Efficient Data Generation for Source-grounded Information-seeking Dialogs: A Use Case for Meeting Transcripts

Golany, Lotem, Galgani, Filippo, Mamo, Maya, Parasol, Nimrod, Vandsburger, Omer, Bar, Nadav, Dagan, Ido

arXiv.org Artificial IntelligenceJun-21-2024

Automating data generation with Large Language Models (LLMs) has become increasingly popular. In this work, we investigate the feasibility and effectiveness of LLM-based data generation in the challenging setting of source-grounded information-seeking dialogs, with response attribution, over long documents. Our source texts consist of long and noisy meeting transcripts, adding to the task complexity. Since automating attribution remains difficult, we propose a semi-automatic approach: dialog queries and responses are generated with LLMs, followed by human verification and identification of attribution spans. Using this approach, we created MISeD -- Meeting Information Seeking Dialogs dataset -- a dataset of information-seeking dialogs focused on meeting transcripts. Models finetuned with MISeD demonstrate superior performance compared to off-the-shelf models, even those of larger size. Finetuning on MISeD gives comparable response generation quality to finetuning on fully manual data, while improving attribution quality and reducing time and effort.

attribution, large language model, natural language, (18 more...)

arXiv.org Artificial Intelligence

2405.01121

Country:

Europe (0.68)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)

Genre: Research Report > New Finding (0.67)

Technology: Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

SimulTron: On-Device Simultaneous Speech to Speech Translation

Agranovich, Alex, Nachmani, Eliya, Rybakov, Oleg, Ding, Yifan, Jia, Ye, Bar, Nadav, Zen, Heiga, Ramanovich, Michelle Tadmor

arXiv.org Artificial IntelligenceJun-4-2024

Simultaneous speech-to-speech translation (S2ST) holds the promise of breaking down communication barriers and enabling fluid conversations across languages. However, achieving accurate, real-time translation through mobile devices remains a major challenge. We introduce SimulTron, a novel S2ST architecture designed to tackle this task. SimulTron is a lightweight direct S2ST model that uses the strengths of the Translatotron framework while incorporating key modifications for streaming operation, and an adjustable fixed delay. Our experiments show that SimulTron surpasses Translatotron 2 in offline evaluations. Furthermore, real-time evaluations reveal that SimulTron improves upon the performance achieved by Translatotron 1. Additionally, SimulTron achieves superior BLEU scores and latency compared to previous real-time S2ST method on the MuST-C dataset. Significantly, we have successfully deployed SimulTron on a Pixel 7 Pro device, show its potential for simultaneous S2ST on-device.

machine learning, natural language, translation, (17 more...)

arXiv.org Artificial Intelligence

2406.02133

Genre: Research Report (0.50)

Technology:

Information Technology > Communications > Mobile (1.00)
Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback