sparsemap
- Europe > Portugal > Lisbon > Lisbon (0.05)
- Asia > Middle East > Jordan (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (3 more...)
SparseMap: A Sparse Tensor Accelerator Framework Based on Evolution Strategy
Zhao, Boran, Zhai, Haiming, Yuan, Zihang, Liu, Hetian, Xia, Tian, Zhao, Wenzhe, Ren, Pengju
The growing demand for sparse tensor algebra (SpTA) in machine learning and big data has driven the development of various sparse tensor accelerators. However, most existing manually designed accelerators are limited to specific scenarios, and it's time-consuming and challenging to adjust a large number of design factors when scenarios change. Therefore, automating the design of SpTA accelerators is crucial. Nevertheless, previous works focus solely on either mapping (i.e., tiling communication and computation in space and time) or sparse strategy (i.e., bypassing zero elements for efficiency), leading to suboptimal designs due to the lack of comprehensive consideration of both. A unified framework that jointly optimizes both is urgently needed. However, integrating mapping and sparse strategies leads to a combinatorial explosion in the design space(e.g., as large as $O(10^{41})$ for the workload $P_{32 \times 64} \times Q_{64 \times 48} = Z_{32 \times 48}$). This vast search space renders most conventional optimization methods (e.g., particle swarm optimization, reinforcement learning and Monte Carlo tree search) inefficient. To address this challenge, we propose an evolution strategy-based sparse tensor accelerator optimization framework, called SparseMap. SparseMap constructing a more comprehensive design space with the consideration of both mapping and sparse strategy. We introduce a series of enhancements to genetic encoding and evolutionary operators, enabling SparseMap to efficiently explore the vast and diverse design space. We quantitatively compare SparseMap with prior works and classical optimization methods, demonstrating that SparseMap consistently finds superior solutions.
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > Portugal > Lisbon > Lisbon (0.05)
- Asia > Middle East > Jordan (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- (4 more...)
preliminary experiments with VIMCO which does not seem to outperform moving average baseline on the bit-vector
We thank all reviewers for their comments. In 5.1, we compare against Sum&Sample, which was shown in prior work [Figure 1 of Liu et al., 2019] Upon acceptance we will normalize these baselines for all experiments and include suggested ones. Thanks for bringing this up, scalability is an important point that we want to make sure is clear in the final version. We will make this clearer. We will include this analysis and plots as suggested.
Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval
Santos, Saul, Niculae, Vlad, McNamee, Daniel, Martins, André F. T.
Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions. Our energies are formulated as the difference between two Fenchel-Young losses: one, parameterized by a generalized entropy, defines the Hopfield scoring mechanism, while the other applies a post-transformation to the Hopfield output. By utilizing Tsallis and norm entropies, we derive end-to-end differentiable update rules that enable sparse transformations, uncovering new connections between loss margins, sparsity, and exact retrieval of single memory patterns. We further extend this framework to structured Hopfield networks using the SparseMAP transformation, allowing the retrieval of pattern associations rather than a single pattern. Our framework unifies and extends traditional and modern Hopfield networks and provides an energy minimization perspective for widely used post-transformations like $\ell_2$-normalization and layer normalization-all through suitable choices of Fenchel-Young losses and by using convex analysis as a building block. Finally, we validate our Hopfield-Fenchel-Young networks on diverse memory recall tasks, including free and sequential recall. Experiments on simulated data, image retrieval, multiple instance learning, and text rationalization demonstrate the effectiveness of our approach.
- North America > United States (0.28)
- Europe > Portugal > Lisbon > Lisbon (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
- (2 more...)
Sparse and Structured Hopfield Networks
Santos, Saul, Niculae, Vlad, McNamee, Daniel, Martins, Andre F. T.
Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach.
- Europe > Portugal > Lisbon > Lisbon (0.14)
- North America > United States (0.14)
- Europe > Austria > Vienna (0.14)
- (2 more...)
DAG Learning on the Permutahedron
Zantedeschi, Valentina, Franceschi, Luca, Kaddour, Jean, Kusner, Matt J., Niculae, Vlad
We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimization approaches our formulation has a number of advantages including: 1. validity: optimizes over exact DAGs as opposed to other relaxations optimizing approximate DAGs; 2. modularity: accommodates any edge-optimization procedure, edge structural parameterization, and optimization loss; 3. end-to-end: either alternately iterates between node-ordering and edge-optimization, or optimizes them jointly. We demonstrate, on real-world data problems in protein-signaling and transcriptional network discovery, that our approach lies on the Pareto frontier of two key metrics, the SID and SHD. In many domains, including cell biology (Sachs et al., 2005), finance (Sanford & Moosa, 2012), and genetics (Zhang et al., 2013), the data generating process is thought to be represented by an underlying directed acylic graph (DAG). Many models rely on DAG assumptions, e.g., causal modeling uses DAGs to model distribution shifts, ensure predictor fairness among subpopulations, or learn agents more sample-efficiently (Kaddour et al., 2022). A key question, with implications ranging from better modeling to causal discovery, is how to recover this unknown DAG from observed data alone. Learning DAGs from observational data alone is fundamentally difficult for two reasons. This riddles the search space with local minima; (ii) Computation: DAG discovery is a costly combinatorial optimization problem over an exponentially large solution space and subject to global acyclicity constraints. To address issue (ii), recent work has proposed continuous relaxations of the DAG learning problem.
- North America > United States > Florida > Monroe County > Key West (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > Middle East > Jordan (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)