Model compression techniques, such as pruning and quantization, are becoming increasingly important to reduce the memory footprints and the amount of computations. Despite model size reduction, achieving performance enhancement on devices is, however, still challenging mainly due to the irregular representations of sparse matrix formats. This paper proposes a new representation to encode the weights of Sparse Quantized Neural Networks, specifically reduced by find-grained and unstructured pruning method. The representation is encoded in a structured regular format, which can be efficiently decoded through XOR gates during inference in a parallel manner. We demonstrate various deep learning models that can be compressed and represented by our proposed format with fixed and high compression ratio. For example, for fully-connected layers of AlexNet on ImageNet dataset, we can represent the sparse weights by only 0.09 bits/weight for 1-bit quantization and 91\% pruning rate with a fixed decoding rate and full memory bandwidth usage.
Pruning is an efficient model compression technique to remove redundancy in the connectivity of deep neural networks (DNNs). Computations using sparse matrices obtained by pruning parameters, however, exhibit vastly different parallelism depending on the index representation scheme. As a result, fine-grained pruning has not gained much attention due to its irregular index form leading to large memory footprint and low parallelism for convolutions and matrix multiplications. In this paper, we propose a new network pruning technique that generates a low-rank binary index matrix to compress index data while decompressing index data is performed by simple binary matrix multiplication. This proposed compression method finds a particular fine-grained pruning mask that can be decomposed into two binary matrices. We also propose a tile-based factorization technique that not only lowers memory requirements but also enhances compression ratio. Various DNN models can be pruned with much fewer indexes compared to previous sparse matrix formats while maintaining the same pruning rate.
With a new $1.5 million grant, the growing field of transfer learning has come to the Ming Hsieh Department of Electrical and Computer Engineering at the USC Viterbi School of Engineering. The grant was awarded to three professors -- Salman Avestimehr, Antonio Ortega and Mahdi Soltanolkotabi -- who will work with Ilias Diakonikolas at the University of Wisconsin, Madison, to address the theoretical foundations of this field. Modern machine learning models are breaking new ground in data science, achieving unprecedented performance on tasks like classifying images in one thousand different image categories. This is achieved by training gigantic neural networks. "Neural networks work really well because they can be trained on huge amounts of pre-existing data that has previously been tagged and collected," said Avestimehr, the primary investigator of the project.
The USC Center for Artificial Intelligence in Society (CAIS)--a joint venture of the USC Suzanne Dworak-Peck School of Social Work and USC Viterbi School of Engineering--will host its first Visiting Fellows Program this summer focused on employing AI to help solve complex societal problems. As part of the Fellows Program, visiting researchers from all over the world will come to USC this summer for up to three months to learn from a working model established by the Center's co-founders, Eric Rice of the USC Suzanne Dworak-Peck School of Social Work and Milind Tambe of the USC Viterbi School of Engineering. The two had successfully collaborated by employing AI to ensure that homeless youth shared important public health information among peers in the youths' own social networks. "Using artificial intelligence to promote the greater good is an emerging area of study with huge potential," said Eric Rice, co-director of CAIS and associate professor at the USC Suzanne Dworak-Peck School of Social Work. "Our goal in establishing this fellowship is to bring together the best and brightest scholars in artificial intelligence and social work to explore breakthrough solutions to age-old problems plaguing many of our cities and communities."
This paper proposes an efficient parallel algorithm for an important class of dynamic programming problems that includes Viterbi, Needleman–Wunsch, Smith–Waterman, and Longest Common Subsequence. In dynamic programming, the subproblems that do not depend on each other, and thus can be computed in parallel, form stages, or wavefronts. The algorithm presented in this paper provides additional parallelism allowing multiple stages to be computed in parallel despite dependences among them. The correctness and the performance of the algorithm relies on rank convergence properties of matrix multiplication in the tropical semiring, formed with plus as the multiplicative operation and max as the additive operation. This paper demonstrates the efficiency of the parallel algorithm by showing significant speedups on a variety of important dynamic programming problems.