Multimodal active speaker detection and virtual cinematography for video conferencing

Cutler, Ross, Mehran, Ramin, Johnson, Sam, Zhang, Cha, Kirk, Adam, Whyte, Oliver, Kowdle, Adarsh

Feb-12-2020–arXiv.org Machine Learning

Active speaker detection (ASD) and virtual cinematography (VC) can significantly improve the remote user experience of a video conference by automatically panning, tilting and zooming of a video conferencing camera: users subjectively rate an expert video cinematographer's video significantly higher than unedited video. We describe a new automated ASD and VC that performs within 0.3 MOS of an expert cinematographer based on subjective ratings with a 1-5 scale. This system uses a 4K wide-FOV camera, a depth camera, and a microphone array; it extracts features from each modality and trains an ASD using an AdaBoost machine learning system that is very efficient and runs in real-time. A VC is similarly trained using machine learning to optimize the subjective quality of the overall experience. To avoid distracting the room participants and reduce switching latency the system has no moving parts -- the VC works by cropping and zooming the 4K wide-FOV video stream. The system was tuned and evaluated using extensive crowdsourcing techniques and evaluated on a dataset with N=100 meetings, each 2-5 minutes in length.

asd and vc, speaker detection, video, (10 more...)

arXiv.org Machine Learning

Feb-12-2020

arXiv.org PDF

Add feedback

Country:
- North America > United States
  - Washington > King County
    - Seattle (0.05)
    - Redmond (0.04)
  - Hawaii > Honolulu County
    - Honolulu (0.04)
  - California
    - San Francisco County > San Francisco (0.14)
    - San Mateo County > Menlo Park (0.04)
    - San Diego County > San Diego (0.04)

Genre:
- Research Report (0.50)

Industry:
- Media > Film (1.00)
- Leisure & Entertainment (1.00)

Technology:
- Information Technology
  - Communications
    - Collaboration (1.00)
    - Social Media > Crowdsourcing (0.35)
  - Artificial Intelligence > Machine Learning
    - Performance Analysis (0.47)
    - Neural Networks (0.47)

Duplicate Docs Excel Report

Title
None found

Similar Docs Excel Report more

Title	Similarity	Source
None found