Unveiling Unexpected Training Data in Internet Video

Jul-27-2021, 21:00:03 GMT–Communications of the ACM

During training, the squared L2 error between the clean spectrogram and the predicted spectrogram is used as a loss function to train the network. At inference time, our separation model can be applied to arbitrarily long segments of video and varying numbers of speakers. The latter is achieved by either directly training the model with multiple-input visual streams (one for speaker), or simply by feeding the visual features of the desired speaker to the visual stream. For full details about the architecture and training process, see our full paper.15

computer vision and pattern recognition, depth map, video, (13 more...)

Communications of the ACM

Jul-27-2021, 21:00:03 GMT

Journals Web Page

Add feedback

Country:
- North America > United States
  - New York > New York County > New York City (0.04)
- Asia
  - Middle East > Israel (0.04)
  - Japan > Honshū
    - Chūbu > Toyama Prefecture > Toyama (0.04)

Industry:
- Leisure & Entertainment (0.93)
- Media
  - Television (0.68)
  - Film (0.68)
  - Photography (0.46)

Technology:
- Information Technology > Artificial Intelligence
  - Vision (1.00)
  - Machine Learning > Neural Networks (0.88)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found