Goto

Collaborating Authors

 Media


Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch

Neural Information Processing Systems

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm we use for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates; and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit.


"Name That Song!" A Probabilistic Approach to Querying on Music and Text

Neural Information Processing Systems

We present a novel, flexible statistical approach for modelling music and text jointly. The approach is based on multi-modal mixture models and maximum a posteriori estimation using EM. The learned models can be used to browse databases with documents containing music and text, to search for music using queries consisting of music and text (lyrics and other contextual information), to annotate text documents with music, and to automatically recommend or identify similar songs.


"Name That Song!" A Probabilistic Approach to Querying on Music and Text

Neural Information Processing Systems

We present a novel, flexible statistical approach for modelling music and text jointly. The approach is based on multi-modal mixture models and maximum a posteriori estimation using EM. The learned models can be used to browse databases with documents containing music and text, to search for music using queries consisting of music and text (lyrics and other contextual information), to annotate text documents with music, and to automatically recommend or identify similar songs.


Real Time Voice Processing with Audiovisual Feedback: Toward Autonomous Agents with Perfect Pitch

Neural Information Processing Systems

We have implemented a real time front end for detecting voiced speech and estimating its fundamental frequency. The front end performs the signal processing for voice-driven agents that attend to the pitch contours of human speech and provide continuous audiovisual feedback. The algorithm weuse for pitch tracking has several distinguishing features: it makes no use of FFTs or autocorrelation at the pitch period; it updates the pitch incrementally on a sample-by-sample basis; it avoids peak picking and does not require interpolation in time or frequency to obtain high resolution estimates;and it works reliably over a four octave range, in real time, without the need for postprocessing to produce smooth contours. The algorithm is based on two simple ideas in neural computation: the introduction of a purposeful nonlinearity, and the error signal of a least squares fit. The pitch tracker is used in two real time multimedia applications: avoice-to-MIDI player that synthesizes electronic music from vocalized melodies,and an audiovisual Karaoke machine with multimodal feedback. Both applications run on a laptop and display the user's pitch scrolling across the screen as he or she sings into the computer.


Qualitative Spatial Reasoning Extracting and Reasoning with Spatial Aggregates

AI Magazine

Reasoning about spatial data is a key task in many applications, including geographic information systems, meteorological and fluid-flow analysis, computer-aided design, and protein structure databases. Such applications often require the identifi- cation and manipulation of qualitative spatial representations, for example, to detect whether one object will soon occlude another in a digital image or efficiently determine relationships between a proposed road and wetland regions in a geographic data set. Qualitative spatial reasoning (QSR) provides representational primitives (a spatial "vocabulary") and inference mechanisms for these tasks. This article first reviews representative work on QSR for data-poor scenarios, where the goal is to design representations that can answer qualitative queries without much numeric information. It then turns to the data-rich case, where the goal is to derive and manipulate qualitative spatial representations that efficiently and correctly abstract important spatial aspects of the underlying data for use in subsequent tasks. This article focuses on how a particular QSR system, SPATIAL AGGREGATION, can help answer spatial queries for scientific and engineering data sets. A case study application of weather analysis illustrates the effective representation and reasoning supported by both data-poor and data-rich forms of QSR


In Search of the Horowitz Factor

AI Magazine

The article introduces the reader to a large interdisciplinary research project whose goal is to use AI to gain new insight into a complex artistic phenomenon. We study fundamental principles of expressive music performance by measuring performance aspects in large numbers of recordings by highly skilled musicians (concert pianists) and analyzing the data with state-of-the-art methods from areas such as machine learning, data mining, and data visualization. The article first introduces the general research questions that guide the project and then summarizes some of the most important results achieved to date, with an emphasis on the most recent and still rather speculative work. Our current results show that it is possible for machines to make novel and interesting discoveries even in a domain such as music and that even if we might never find the "Horowitz Factor," AI can give us completely new insights into complex artistic behavior.


In Search of the Horowitz Factor

AI Magazine

The article introduces the reader to a large interdisciplinary research project whose goal is to use AI to gain new insight into a complex artistic phenomenon. We study fundamental principles of expressive music performance by measuring performance aspects in large numbers of recordings by highly skilled musicians (concert pianists) and analyzing the data with state-of-the-art methods from areas such as machine learning, data mining, and data visualization. The article first introduces the general research questions that guide the project and then summarizes some of the most important results achieved to date, with an emphasis on the most recent and still rather speculative work. A broad view of the discovery process is given, from data acquisition through data visualization to inductive model building and pattern discovery, and it turns out that AI plays an important role in all stages of such an ambitious enterprise. Our current results show that it is possible for machines to make novel and interesting discoveries even in a domain such as music and that even if we might never find the "Horowitz Factor," AI can give us completely new insights into complex artistic behavior.


Interval Constraint Solving for Camera Control and Motion Planning

arXiv.org Artificial Intelligence

Many problems in robust control and motion planning can be reduced to either find a sound approximation of the solution space determined by a set of nonlinear inequalities, or to the ``guaranteed tuning problem'' as defined by Jaulin and Walter, which amounts to finding a value for some tuning parameter such that a set of inequalities be verified for all the possible values of some perturbation vector. A classical approach to solve these problems, which satisfies the strong soundness requirement, involves some quantifier elimination procedure such as Collins' Cylindrical Algebraic Decomposition symbolic method. Sound numerical methods using interval arithmetic and local consistency enforcement to prune the search space are presented in this paper as much faster alternatives for both soundly solving systems of nonlinear inequalities, and addressing the guaranteed tuning problem whenever the perturbation vector has dimension one. The use of these methods in camera control is investigated, and experiments with the prototype of a declarative modeller to express camera motion using a cinematic language are reported and commented.


The AAAI-2002 Robot Challenge

AI Magazine

The Eighteenth National Conference on Artificial Intelligence (AAAI-2002) Robot Challenge is part of an annual series of robot challenges and competitions. It is intended to promote the development of robot systems that interact intelligently with humans in natural environments. The Challenge task calls for a robot to attend the AAAI conference, which includes registering for the conference and giving a talk about itself. In this article, we review the task requirements, introduce the robots that participated at AAAI-2002 and describe the strengths and weaknesses of their performance.


Monte Carlo Methods for Tempo Tracking and Rhythm Quantization

Journal of Artificial Intelligence Research

We present a probabilistic generative model for timing deviations in expressive music performance. The structure of the proposed model is equivalent to a switching state space model. The switch variables correspond to discrete note locations as in a musical score. The continuous hidden variables denote the tempo. We formulate two well known music recognition problems, namely tempo tracking and automatic transcription (rhythm quantization) as filtering and maximum a posteriori (MAP) state estimation tasks. Exact computation of posterior features such as the MAP state is intractable in this model class, so we introduce Monte Carlo methods for integration and optimization. We compare Markov Chain Monte Carlo (MCMC) methods (such as Gibbs sampling, simulated annealing and iterative improvement) and sequential Monte Carlo methods (particle filters). Our simulation results suggest better results with sequential methods. The methods can be applied in both online and batch scenarios such as tempo tracking and transcription and are thus potentially useful in a number of music applications such as adaptive automatic accompaniment, score typesetting and music information retrieval.