Cosatto, Eric


A Massively Parallel Digital Learning Processor

Neural Information Processing Systems

We present a new, massively parallel architecture for accelerating machine learning algorithms, based on arrays of variable-resolution arithmetic vector processing elements (VPE). Groups of VPEs operate in SIMD (single instruction multiple data) mode, and each group is connected to an independent memory bank. In this way memory bandwidth scales with the number of VPE, and the main data flows are local, keeping power dissipation low. With 256 VPEs, implemented on two FPGA (field programmable gate array) chips, we obtain a sustained speed of 19 GMACS (billion multiply-accumulate per sec.) for SVM training, and 86 GMACS for SVM classification. This performance is more than an order of magnitude higher than that of any FPGA implementation reported so far. The speed on one FPGA is similar to the fastest speeds published on a Graphics Processor for the MNIST problem, despite a clock rate of the FPGA that is six times lower. High performance at low clock rates makes this massively parallel architecture particularly attractive for embedded applications, where low power dissipation is critical. Tests with Convolutional Neural Networks and other learning algorithms are under way now.


Off-Road Obstacle Avoidance through End-to-End Learning

Neural Information Processing Systems

We describe a vision-based obstacle avoidance system for off-road mobile robots. The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wireless color cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolution images. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.


Off-Road Obstacle Avoidance through End-to-End Learning

Neural Information Processing Systems

We describe a vision-based obstacle avoidance system for off-road mobile robots.The system is trained from end to end to map raw input images to steering angles. It is trained in supervised mode to predict the steering angles provided by a human driver during training runs collected in a wide variety of terrains, weather conditions, lighting conditions, and obstacle types. The robot is a 50cm off-road truck, with two forwardpointing wirelesscolor cameras. A remote computer processes the video and controls the robot via radio. The learning system is a large 6-layer convolutional network whose input is a single left/right pair of unprocessed low-resolutionimages. The robot exhibits an excellent ability to detect obstacles and navigate around them in real time at speeds of 2 m/s.


Parallel Support Vector Machines: The Cascade SVM

Neural Information Processing Systems

We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a'Cascade' of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes th rough the Cascade, but already a single pass provides good generalization. A single pass is 5x - 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week.


Parallel Support Vector Machines: The Cascade SVM

Neural Information Processing Systems

We describe an algorithm for support vector machines (SVM) that can be parallelized efficiently and scales to very large problems with hundreds of thousands of training vectors. Instead of analyzing the whole training set in one optimization step, the data are split into subsets and optimized separately with multiple SVMs. The partial results are combined and filtered again in a'Cascade' of SVMs, until the global optimum is reached. The Cascade SVM can be spread over multiple processors with minimal communication overhead and requires far less memory, since the kernel matrices are much smaller than for a regular SVM. Convergence to the global optimum is guaranteed with multiple passes through the Cascade, but already a single pass provides good generalization. A single pass is 5x - 10x faster than a regular SVM for problems of 100,000 vectors when implemented on a single processor. Parallel implementations on a cluster of 16 processors were tested with over 1 million vectors (2-class problems), converging in a day or two, while a regular SVM never converged in over a week.


Address Block Location with a Neural Net System

Neural Information Processing Systems

We developed a system for finding address blocks on mail pieces that can process four images per second. Besides locating the address block, our system also determines the writing style, handwritten or machine printed, and moreover, it measures the skew angle of the text lines and cleans noisy images. A layout analysis of all the elements present in the image is performed in order to distinguish drawings and dirt from text and to separate text of advertisement from that of the destination address. A speed of more than four images per second is obtained on a modular hardware platform, containing a board with two of the NET32K neural net chips, a SPARC2 processor board, and a board with 2 digital signal processors. The system has been tested with more than 100,000 images. Its performance depends on the quality of the images, and lies between 85% correct location in very noisy images to over 98% in cleaner images.


Address Block Location with a Neural Net System

Neural Information Processing Systems

We developed a system for finding address blocks on mail pieces that can process four images per second. Besides locating the address block, our system also determines the writing style, handwritten or machine printed, and moreover, it measures the skew angle of the text lines and cleans noisy images. A layout analysis of all the elements present in the image is performed in order to distinguish drawings and dirt from text and to separate text of advertisement from that of the destination address. A speed of more than four images per second is obtained on a modular hardware platform, containing a board with two of the NET32K neural net chips, a SP ARC2 processor board, and a board with 2 digital signal processors. The system has been tested with more than 100,000 images. Its performance depends on the quality of the images, and lies between 85% correct location in very noisy images to over 98% in cleaner images.