The Cone of Silence: Speech Separation by Localization

Jenrungrot, Teerapat, Jayaram, Vivek, Seitz, Steve, Kemelmacher-Shlizerman, Ira

Oct-12-2020–arXiv.org Artificial Intelligence

Given a multi-microphone recording of an unknown number of speakers talking concurrently, we simultaneously localize the sources and separate the individual speakers. At the core of our method is a deep network, in the waveform domain, which isolates sources within an angular region $\theta \pm w/2$, given an angle of interest $\theta$ and angular window size $w$. By exponentially decreasing $w$, we can perform a binary search to localize and separate all sources in logarithmic time. Our algorithm allows for an arbitrary number of potentially moving speakers at test time, including more speakers than seen during training. Experiments demonstrate state-of-the-art performance for both source separation and source localization, particularly in high levels of background noise.

artificial intelligence, machine learning, spatial reasoning, (17 more...)

arXiv.org Artificial Intelligence

Oct-12-2020

arXiv.org PDF

Add feedback

Country:
- North America
  - United States > Rhode Island
    - Providence County > Providence (0.04)
  - Canada > British Columbia
    - Metro Vancouver Regional District > Vancouver (0.04)

Genre:
- Research Report > New Finding (0.68)

Industry:
- Information Technology (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Speech (0.71)
  - Representation & Reasoning > Spatial Reasoning (0.68)
  - Machine Learning > Neural Networks
    - Deep Learning (1.00)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found