Ivry, Amir
Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Abramovski, Igor, Vinnikov, Alon, Shaer, Shalev, Kanda, Naoyuki, Wang, Xiaofei, Ivry, Amir, Krupka, Eyal
The first Natural Office Talkers in Settings of Far-field Audio Recordings (NOTSOFAR-1) Challenge is a pivotal initiative that sets new benchmarks by offering datasets more representative of the needs of real-world business applications than those previously available. The challenge provides a unique combination of 280 recorded meetings across 30 diverse environments, capturing real-world acoustic conditions and conversational dynamics, and a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. In this paper, we provide an overview of the systems submitted to the challenge and analyze the top-performing approaches, hypothesizing the factors behind their success. Additionally, we highlight promising directions left unexplored by participants. By presenting key findings and actionable insights, this work aims to drive further innovation and progress in DASR research and applications.
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription
Vinnikov, Alon, Ivry, Amir, Hurvitz, Aviv, Abramovski, Igor, Koubi, Sharon, Gurvich, Ilya, Pe`er, Shai, Xiao, Xiong, Elizalde, Benjamin Martinez, Kanda, Naoyuki, Wang, Xiaofei, Shaer, Shalev, Yagev, Stav, Asher, Yossi, Sivasankaran, Sunit, Gong, Yifan, Tang, Min, Wang, Huaming, Krupka, Eyal
We introduce the first Natural Office Talkers in Settings of Far-field Audio Recordings (``NOTSOFAR-1'') Challenge alongside datasets and baseline system. The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics. It is recorded across 30 conference rooms, featuring 4-8 attendees and a total of 35 unique speakers. Second, a 1000-hour simulated training dataset, synthesized with enhanced authenticity for real-world generalization, incorporating 15,000 real acoustic transfer functions. The tasks focus on single-device DASR, where multi-channel devices always share the same known geometry. This is aligned with common setups in actual conference rooms, and avoids technical complexities associated with multi-device tasks. It also allows for the development of geometry-specific solutions. The NOTSOFAR-1 Challenge aims to advance research in the field of distant conversational speech recognition, providing key resources to unlock the potential of data-driven methods, which we believe are currently constrained by the absence of comprehensive high-quality training and benchmarking datasets.
Deep Learning Interviews: Hundreds of fully solved job interview questions from a wide range of key topics in AI
Kashani, Shlomo, Ivry, Amir
The second edition of Deep Learning Interviews is home to hundreds of fully-solved problems, from a wide range of key topics in AI. It is designed to both rehearse interview or exam specific topics and provide machine learning MSc / PhD. students, and those awaiting an interview a well-organized overview of the field. The problems it poses are tough enough to cut your teeth on and to dramatically improve your skills-but they're framed within thought-provoking questions and engaging stories. That is what makes the volume so specifically valuable to students and job seekers: it provides them with the ability to speak confidently and quickly on any relevant topic, to answer technical questions clearly and correctly, and to fully understand the purpose and meaning of interview questions and answers. Those are powerful, indispensable advantages to have when walking into the interview room. The book's contents is a large inventory of numerous topics relevant to DL job interviews and graduate level exams. That places this work at the forefront of the growing trend in science to teach a core set of practical mathematical and computational skills. It is widely accepted that the training of every computer scientist must include the fundamental theorems of ML, and AI appears in the curriculum of nearly every university. This volume is designed as an excellent reference for graduates of such programs.
Multiclass Permanent Magnets Superstructure for Indoor Localization using Artificial Intelligence
Ivry, Amir, Fisher, Elad, Alimi, Roger, Mosseri, Idan, Nahir, Kanna
Smartphones have become a popular tool for indoor localization and position estimation of users. Existing solutions mainly employ Wi-Fi, RFID, and magnetic sensing techniques to track movements in crowded venues. These are highly sensitive to magnetic clutters and depend on local ambient magnetic fields, which frequently degrades their performance. Also, these techniques often require pre-known mapping surveys of the area, or the presence of active beacons, which are not always available. We embed small-volume and large-moment magnets in pre-known locations and arrange them in specific geometric constellations that create magnetic superstructure patterns of supervised magnetic signatures. These signatures constitute an unambiguous magnetic environment with respect to the moving sensor carrier. The localization algorithm learns the unique patterns of the scattered magnets during training and detects them from the ongoing streaming of data during localization. Our contribution is twofold. First, we deploy passive permanent magnets that do not require a power supply, in contrast to active magnetic transmitters. Second, we perform localization based on smartphone motion rather than on static positioning of the magnetometer. In our previous study, we considered a single superstructure pattern. Here, we present an extended version of that algorithm for multi-superstructure localization, which covers a broader localization area of the user. Experimental results demonstrate localization accuracy of 95% with a mean localization error of less than 1m using artificial intelligence.