connection
Federated Hyperparameter Tuning: Challenges, Baselines, and Connections to Weight-Sharing
Tuning hyperparameters is a crucial but arduous part of the machine learning pipeline. Hyperparameter optimization is even more challenging in federated learning, where models are learned over a distributed network of heterogeneous devices; here, the need to keep data on device and perform local training makes it difficult to efficiently train and evaluate configurations. In this work, we investigate the problem of federated hyperparameter tuning. We first identify key challenges and show how standard approaches may be adapted to form baselines for the federated setting. Then, by making a novel connection to the neural architecture search technique of weight-sharing, we introduce a new method, FedEx, to accelerate federated hyperparameter tuning that is applicable to widely-used federated optimization methods such as FedAvg and recent variants. Theoretically, we show that a FedEx variant correctly tunes the on-device learning rate in the setting of online convex optimization across devices. Empirically, we show that FedEx can outperform natural baselines for federated hyperparameter tuning by several percentage points on the Shakespeare, FEMNIST, and CIFAR-10 benchmarks--obtaining higher accuracy using the same training budget.
O(n) Connections are Expressive Enough: Universal Approximability of Sparse Transformers
Recently, Transformer networks have redefined the state of the art in many NLP tasks. However, these models suffer from quadratic computational cost in the input sequence length $n$ to compute pairwise attention in each layer. This has prompted recent research into sparse Transformers that sparsify the connections in the attention layers. While empirically promising for long sequences, fundamental questions remain unanswered: Can sparse Transformers approximate any arbitrary sequence-to-sequence function, similar to their dense counterparts? How does the sparsity pattern and the sparsity level affect their performance? In this paper, we address these questions and provide a unifying framework that captures existing sparse attention models. We propose sufficient conditions under which we prove that a sparse attention model can universally approximate any sequence-to-sequence function. Surprisingly, our results show that sparse Transformers with only $O(n)$ connections per attention layer can approximate the same function class as the dense model with $n^2$ connections.
Exploring the Connection Between Binary and Spiking Neural Networks
On-chip edge intelligence has necessitated the exploration of algorithmic techniques to reduce the compute requirements of current machine learning frameworks. This work aims to bridge the recent algorithmic progress in training Binary Neural Networks and Spiking Neural Networks--both of which are driven by the same motivation and yet synergies between the two have not been fully explored. We show that training Spiking Neural Networks in the extreme quantization regime results in near full precision accuracies on large-scale datasets like CIFAR-100 and ImageNet. An important implication of this work is that Binary Spiking Neural Networks can be enabled by "In-Memory" hardware accelerators catered for Binary Neural Networks without suffering any accuracy degradation due to binarization. We utilize standard training techniques for non-spiking networks to generate our spiking networks by conversion process and also perform an extensive empirical analysis and explore simple design-time and run-time optimization techniques for reducing inference latency of spiking networks (both for binary and full-precision models) by an order of magnitude over prior work.
Rethinking Goal-conditioned Supervised Learning and Its Connection to Offline RL
Yang, Rui, Lu, Yiming, Li, Wenzhe, Sun, Hao, Fang, Meng, Du, Yali, Li, Xiu, Han, Lei, Zhang, Chongjie
Solving goal-conditioned tasks with sparse rewards using self-supervised learning is promising because of its simplicity and stability over current reinforcement learning (RL) algorithms. A recent work, called Goal-Conditioned Supervised Learning (GCSL), provides a new learning framework by iteratively relabeling and imitating self-generated experiences. In this paper, we revisit the theoretical property of GCSL -- optimizing a lower bound of the goal reaching objective, and extend GCSL as a novel offline goal-conditioned RL algorithm. The proposed method is named Weighted GCSL (WGCSL), in which we introduce an advanced compound weight consisting of three parts (1) discounted weight for goal relabeling, (2) goal-conditioned exponential advantage weight, and (3) best-advantage weight. Theoretically, WGCSL is proved to optimize an equivalent lower bound of the goal-conditioned RL objective and generates monotonically improved policies via an iterated scheme. The monotonic property holds for any behavior policies, and therefore WGCSL can be applied to both online and offline settings. To evaluate algorithms in the offline goal-conditioned RL setting, we provide a benchmark including a range of point and simulated robot domains. Experiments in the introduced benchmark demonstrate that WGCSL can consistently outperform GCSL and existing state-of-the-art offline methods in the fully offline goal-conditioned setting.
Thirteenth International Distributed AI Workshop
This article discusses the Thirteenth International Distributed AI Workshop. An overview of the workshop is given as well as concerns and goals for the technology. The central problem in DAI is how to achieve coordinated action among such agents, so that they can accomplish more as a group than as individuals. The DAI workshop is dedicated to advancing the state of the art in this field. This year's workshop took place on the Olympic Peninsula in Washington State on 28 to 30 July 1994 and included 45 participants from North America, Europe, and the Pacific Rim.
Review of The Computational Beauty of Nature
Its basic premise is that these "most interesting computational topics today" are deeply interrelated, and in some heretofore undescribed ways. The text is well crafted, and the scholarship is both broad and deep. The author is clearly a renaissance man as well as a wonderful teacher. He is equally good at succinct summaries and painting the big picture, and he makes particularly effective use of examples. Best of all is his infectious joy about his subject: The text is full of percolations of delight at the beauty of some concept or equation or at the sheer fun of hacking code.
Assembly Sequence Planning
Assembly plays a fundamental role in the manufacturing of most products. Parts that have been individually formed or machined to meet designed specifications are assembled into a configuration that achieves the functions of the final product or mechanism. The economic importance of assembly as a manufacturing process has led to extensive efforts to improve the efficiency and cost effectiveness of assembly operations. The sequence of mating operations that can be carried out to assemble a group of parts is constrained by the geometric and mechanical properties of the parts, their assembled configuration, and the stability of the resulting subassemblies. An approach to representation and reasoning about these sequences is described here and leads to several alternative explicit and implicit plan representations.
?utm_content=buffer0509f&utm_medium=social&utm_source=twitter.com&utm_campaign=buffer
Deep learning is changing the way we use and think about machines. Current incarnations are better than humans at all kinds of tasks, from chess and Go to face recognition and object recognition. In particular, humans have the extraordinary ability to constantly update their memories with the most important knowledge while overwriting information that is no longer useful. The world provides a never-ending source of data, much of which is irrelevant to the tricky business of survival, and most of which is impossible to store in a limited memory. So humans and other creatures have evolved ways to retain important skills while forgetting irrelevant ones.
AI helps computers hone the fine art of forgetting
Deep learning is changing the way we use and think about machines. Current incarnations are better than humans at all kinds of tasks, from chess and Go to face recognition and object recognition. In particular, humans have the extraordinary ability to constantly update their memories with the most important knowledge while overwriting information that is no longer useful. The world provides a never-ending source of data, much of which is irrelevant to the tricky business of survival, and most of which is impossible to store in a limited memory. So humans and other creatures have evolved ways to retain important skills while forgetting irrelevant ones.