Goto

Collaborating Authors

 Rabbat, Michael G.


A Massively Parallel Associative Memory Based on Sparse Neural Networks

arXiv.org Artificial Intelligence

Associative memories store content in such a way that the content can be later retrieved by presenting the memory with a small portion of the content, rather than presenting the memory with an address as in more traditional memories. Associative memories are used as building blocks for algorithms within database engines, anomaly detection systems, compression algorithms, and face recognition systems. A classical example of an associative memory is the Hopfield neural network. Recently, Gripon and Berrou have introduced an alternative construction which builds on ideas from the theory of error correcting codes and which greatly outperforms the Hopfield network in capacity, diversity, and efficiency. In this paper we implement a variation of the Gripon-Berrou associative memory on a general purpose graphical processing unit (GPU). The work of Gripon and Berrou proposes two retrieval rules, sum-of-sum and sum-of-max. The sum-of-sum rule uses only matrix-vector multiplication and is easily implemented on the GPU. The sum-of-max rule is much less straightforward to implement because it involves non-linear operations. However, the sum-of-max rule gives significantly better retrieval error rates. We propose a hybrid rule tailored for implementation on a GPU which achieves a 880-fold speedup without sacrificing any accuracy.


Communication/Computation Tradeoffs in Consensus-Based Distributed Optimization

Neural Information Processing Systems

We study the scalability of consensus-based distributed optimization algorithms by considering two questions: How many processors should we use for a given problem, and how often should they communicate when communication is not free? Central to our analysis is a problem-specific value $r$ which quantifies the communication/computation tradeoff. We show that organizing the communication among nodes as a $k$-regular expander graph~\cite{kRegExpanders} yields speedups, while when all pairs of nodes communicate (as in a complete graph), there is an optimal number of processors that depends on $r$. Surprisingly, a speedup can be obtained, in terms of the time to reach a fixed level of accuracy, by communicating less and less frequently as the computation progresses. Experiments on a real cluster solving metric learning and non-smooth convex minimization tasks demonstrate strong agreement between theory and practice.


Distributed Strongly Convex Optimization

arXiv.org Machine Learning

A lot of effort has been invested into characterizing the convergence rates of gradient based algorithms for non-linear convex optimization. Recently, motivated by large datasets and problems in machine learning, the interest has shifted towards distributed optimization. In this work we present a distributed algorithm for strongly convex constrained optimization. Each node in a network of n computers converges to the optimum of a strongly convex, L-Lipchitz continuous, separable objective at a rate O(log (sqrt(n) T) / T) where T is the number of iterations. This rate is achieved in the online setting where the data is revealed one at a time to the nodes, and in the batch setting where each node has access to its full local dataset from the start. The same convergence rate is achieved in expectation when the subgradients used at each node are corrupted with additive zero-mean noise.


Inferring Network Structure from Co-Occurrences

Neural Information Processing Systems

We consider the problem of inferring the structure of a network from cooccurrence data:observations that indicate which nodes occur in a signaling pathway but do not directly reveal node order within the pathway. This problem is motivated by network inference problems arising in computational biology and communication systems, in which it is difficult or impossible to obtain precise time ordering information. Without order information, every permutation of the activated nodes leads to a different feasible solution, resulting in combinatorial explosion of the feasible set. However, physical principles underlying most networked systemssuggest that not all feasible solutions are equally likely. Intuitively, nodes that cooccur more frequently are probably more closely connected. Building on this intuition, we model path co-occurrences as randomly shuffled samples of a random walk on the network. We derive a computationally efficient network inference algorithm and, via novel concentration inequalities for importance samplingestimators, prove that a polynomial complexity Monte Carlo version of the algorithm converges with high probability.