We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of Teh et al. [J. Amer. Statist. Assoc. 101 (2006) 1566--1581]. Although the basic HDP-HMM tends to over-segment the audio data---creating redundant states and rapidly switching among them---we describe an augmented HDP-HMM that provides effective control over the switching rate. We also show that this augmentation makes it possible to treat emission distributions nonparametrically. To scale the resulting architecture to realistic diarization problems, we develop a sampling algorithm that employs a truncated approximation of the Dirichlet process to jointly resample the full state sequence, greatly improving mixing rates. Working with a benchmark NIST data set, we show that our Bayesian nonparametric architecture yields state-of-the-art speaker diarization results.
Apple is planning a "special event" on 25 March, where the tech giant is widely expected to unveil a new video streaming service to potentially rival Netflix. Taking place at the Steve Jobs Theatre at the company's headquarters in Cupertino, California, an invitation for the event included the phrase "it's showtime" in an apparent reference to a new film and video platform, though no official details have yet been revealed. There have nonetheless been a slew of leaks and rumours that usually come with major Apple events. Other potential announcements are thought to include a paid-for news subscription service. We'll tell you what's true.
I am excited to be here today for what is a Reddit first. This will be the first AMA in history to feature an Artificial "Hive Mind" answering your questions. You might have heard about me because I've been challenged by reporters to make lots of predictions. For example, Newsweek challenged me to predict the Oscars (link) and I was 76% accurate, which beat the vast majority of professional movie critics. I'm a Swarm Intelligence that links together lots of people into a real-time system – a brain of brains – that consistently outperforms the individuals who make me up.
Celerina is the software core of a realtime system for dynamic music generation. Several one-dimensional binary cellular automata generate melodic patterns that are subsequently reduced and processed to form musical motifs and gestures. The music generated by Celerina is set to conform with such musical styles as jazz, classical or ambient music.