AITopics

Perceptual learning is defined as fast improvement in performance and retention of the learned ability over a period of time. In a set of psychophysical experiments we demonstrated that perceptual learning occurs for the discrimination of direction in stochastic motion stimuli. Here we model this learning using two approaches: a clustering model that learns to accommodate the motion noise, and an averaging model that learns to ignore the noise. Simulations of the models show performance similar to the psychophysical results. 1 Introduction Global motion perception is critical to many visual tasks: to perceive self-motion, to identify objects in motion, to determine the structure of the environment, and to make judgements for safe navigation. In the presence of noise, as in random dot kinematograms, efficient extraction of global motion involves considerable spatial integration. Newsome and Colleagues (1989) showed that neurons in the macaque middle temporal area (MT) are motion direction-selective, and perform global integration of motion in their large receptive fields. Psychophysical studies in humans have characterized the limits of spatial and temporal integration in motion (Watamaniuk et.

discrimination, learning, motion direction, (16 more...)

Country: North America > United States > Massachusetts > Suffolk County > Boston (0.04)

Genre: Research Report (0.94)

Industry: Health & Medicine (0.47)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Nowlan, Steven J., Platt, John C.

A Convolutional Neural Network Hand Tracker

We describe a system that can track a hand in a sequence of video frames and recognize hand gestures in a user-independent manner. The system locates the hand in each video frame and determines if the hand is open or closed. The tracking system is able to track the hand to within 10 pixels of its correct location in 99.7% of the frames from a test set containing video sequences from 18 different individuals captured in 18 different room environments. The gesture recognition network correctly determines if the hand being tracked is open or closed in 99.1 % of the frames in this test set. The system has been designed to operate in real time with existing hardware.

convolutional network, sequence, video frame, (12 more...)

Country: North America > United States > California > Santa Clara County > San Jose (0.06)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.44)

Rao, Rajesh P. N., Ballard, Dana H.

Learning Saccadic Eye Movements Using Multiscale Spatial Filters

Such sensors realize the simultaneous need for wide field-of-view and good visual acuity. One popular class of space-variant sensors is formed by log-polar sensors which have a small area near the optical axis of greatly increased resolution (the fovea) coupled with a peripheral region that witnesses a gradual logarithmic falloff in resolution as one moves radially outward. These sensors are inspired by similar structures found in the primate retina where one finds both a peripheral region of gradually decreasing acuity and a circularly symmetric area centmlis characterized by a greater density of receptors and a disproportionate representation in the optic nerve [3]. The peripheral region, though of low visual acuity, is more sensitive to light intensity and movement. The existence of a region optimized for discrimination and recognition surrounded by a region geared towards detection thus allows the image of an object of interest detected in the outer region to be placed on the more analytic center for closer scrutiny. Such a strategy however necessitates the existence of (a) methods to determine which location in the periphery to foveate next, and (b) fast gaze-shifting mechanisms to achieve this 894 Rajesh P. N. Rao, Dana H. Ballard

response vector, saccade, vector, (15 more...)

Country:

North America > United States > New York > Monroe County > Rochester (0.04)
North America > United States > Massachusetts > Middlesex County > Reading (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)

Industry: Health & Medicine > Therapeutic Area > Ophthalmology/Optometry (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Sensing and Signal Processing > Image Processing (0.69)

Back, Andrew D., Tsoi, Ah Chung

A Comparison of Discrete-Time Operator Models for Nonlinear System Identification

We present a unifying view of discrete-time operator models used in the context of finite word length linear signal processing. Comparisons are made between the recently presented gamma operator model, and the delta and rho operator models for performing nonlinear system identification and prediction using neural networks. A new model based on an adaptive bilinear transformation which generalizes all of the above models is presented.

discrete-time operator model, operator, pi operator, (15 more...)

Country:

Oceania > Australia > Queensland (0.04)
North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)

Furlanello, Cesare, Giuliani, Diego, Trentin, Edmondo

Connectionist Speaker Normalization with Generalized Resource Allocating Networks

The paper presents a rapid speaker-normalization technique based on neural network spectral mapping. The neural network is used as a front-end of a continuous speech recognition system (speakerdependent, HMM-based) to normalize the input acoustic data from a new speaker. The spectral difference between speakers can be reduced using a limited amount of new acoustic data (40 phonetically rich sentences). Recognition error of phone units from the acoustic-phonetic continuous speech corpus APASCI is decreased with an adaptability ratio of 25%. We used local basis networks of elliptical Gaussian kernels, with recursive allocation of units and online optimization of parameters (GRAN model). For this application, the model included a linear term. The results compare favorably with multivariate linear mapping based on constrained orthonormal transformations.

mapping, recognition system, utterance, (11 more...)

Country:

Europe > Italy > Trentino-Alto Adige/Südtirol > Trentino Province > Trento (0.05)
North America > United States > New York (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Zhao, Ying, Schwartz, Richard M., Sroka, Jason J., Makhoul, John

Hierarchical Mixtures of Experts Methodology Applied to Continuous Speech Recognition

In this paper, we incorporate the Hierarchical Mixtures of Experts (HME) method of probability estimation, developed by Jordan [1], into an HMMbased continuous speech recognition system. The resulting system can be thought of as a continuous-density HMM system, but instead of using gaussian mixtures, the HME system employs a large set of hierarchically organized but relatively small neural networks to perform the probability density estimation. The hierarchical structure is reminiscent of a decision tree except for two important differences: each "expert" or neural net performs a "soft" decision rather than a hard decision, and, unlike ordinary decision trees, the parameters of all the neural nets in the HME are automatically trainable using the EM algorithm. We report results on the ARPA 5,OOO-word and 4O,OOO-word Wall Street Journal corpus using HME models. 1 Introduction Recent research has shown that a continuous-density HMM (CD-HMM) system can outperform a more constrained tied-mixture HMM system for large-vocabulary continuous speech recognition (CSR) when a large amount of training data is available [2]. In other work, the utility of decision trees has been demonstrated in classification problems by using the "divide and conquer" paradigm effectively, where a problem is divided into a hierarchical set of simpler problems.

decision tree, hierarchical mixture, hmm system, (10 more...)

Country:

Asia > Middle East > Jordan (0.25)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.05)

Industry: Government > Military (0.36)

Technology:

Information Technology > Artificial Intelligence > Speech > Speech Recognition (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Decision Tree Learning (0.97)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Visual Speech Recognition with Stochastic Networks

Movellan, Javier R.

This paper presents ongoing work on a speaker independent visual speech recognition system. The work presented here builds on previous research efforts in this area and explores the potential use of simple hidden Markov models for limited vocabulary, speaker independent visual speech recognition. The task at hand is recognition of the first four English digits, a task with possible applications in car-phone dialing. The images were modeled as mixtures of independent Gaussian distributions, and the temporal dependencies were captured with standard left-to-right hidden Markov models. The results indicate that simple hidden Markov models may be used to successfully recognize relatively unprocessed image sequences.

speech perception, speech recognition, visual speech recognition, (12 more...)

Country:

North America > United States > New Jersey (0.04)
North America > United States > California > San Mateo County > San Mateo (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
North America > United States > California > San Diego County > La Jolla (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Fels, Sidney, Hinton, Geoffrey E.

Glove-TalkII: Mapping Hand Gestures to Speech Using Neural Networks

Glove-TaikII is a system which translates hand gestures to speech through an adaptive interface. Hand gestures are mapped continuously to 10 control parameters of a parallel formant speech synthesizer. The mapping allows the hand to act as an artificial vocal tract that produces speech in real time. This gives an unlimited vocabulary in addition to direct control of fundamental frequency and volume. Currently, the best version of Glove-TalkII uses several input devices (including a CyberGlove, a ContactGlove, a 3-space tracker, and a foot-pedal), a parallel formant speech synthesizer and 3 neural networks.

configuration, mapping, speech, (17 more...)

Country:

North America > Canada > Ontario > Toronto (0.30)
Europe > Germany > Baden-Württemberg > Stuttgart Region > Stuttgart (0.05)
North America > United States > New York (0.04)

Technology:

Information Technology > Artificial Intelligence > Vision > Gesture Recognition (0.81)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.73)
Information Technology > Artificial Intelligence > Speech > Speech Synthesis (0.58)

Waterhouse, Steve R., Robinson, Anthony J.

Non-linear Prediction of Acoustic Vectors Using Hierarchical Mixtures of Experts

We are concerned in this paper with the application of multiple models, specifically the Hierarchical Mixtures of Experts, to time series prediction, specifically the problem of predicting acoustic vectors for use in speech coding. There have been a number of applications of multiple models in time series prediction. A classic example is the Threshold Autoregressive model (TAR) which was used by Tong & 836 S. R. Waterhouse, A. J. Robinson Lim (1980) to predict sunspot activity. More recently, Lewis, Kay and Stevens (in Weigend & Gershenfeld (1994)) describe the use of Multivariate and Regression Splines (MARS) to the prediction of future values of currency exchange rates. Finally, in speech prediction, Cuperman & Gersho (1985) describe the Switched Inter-frame Vector Prediction (SIVP) method which switches between separate linear predictors trained on different statistical classes of speech.

hierarchical mixture, prediction, variance, (13 more...)

Country:

North America > United States > Colorado > Boulder County > Boulder (0.14)
Asia > Middle East > Jordan (0.06)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.89)

Hasler, Paul E., Diorio, Chris, Minch, Bradley A., Mead, Carver

Single Transistor Learning Synapses

The past few years have produced a number of efforts to design VLSI chips which "learn from experience." The first step toward this goal is developing a silicon analog for a synapse. We have successfully developed such a synapse using only 818 Paul Hasler, Chris Diorio, Bradley A. Minch, Carver Mead

source current, synapse, voltage, (13 more...)

Country: North America > United States > California > Los Angeles County > Pasadena (0.04)

Industry: Semiconductors & Electronics (0.54)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)