AITopics | sparsemap

Collaborating Authors

sparsemap

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

887caadc3642e304ede659b734f79b00-Paper.pdf

Neural Information Processing SystemsFeb-9-2026, 06:42:57 GMT

latent variable, proc, sparsemax, (16 more...)

Neural Information Processing Systems

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
Asia > Middle East > Jordan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(3 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

The growing demand for sparse tensor algebra (SpTA) in machine learning and big data has driven the development of various sparse tensor accelerators. However, most existing manually designed accelerators are limited to specific scenarios, and it's time-consuming and challenging to adjust a large number of design factors when scenarios change. Therefore, automating the design of SpTA accelerators is crucial. Nevertheless, previous works focus solely on either mapping (i.e., tiling communication and computation in space and time) or sparse strategy (i.e., bypassing zero elements for efficiency), leading to suboptimal designs due to the lack of comprehensive consideration of both. A unified framework that jointly optimizes both is urgently needed. However, integrating mapping and sparse strategies leads to a combinatorial explosion in the design space(e.g., as large as $O(10^{41})$ for the workload $P_{32 \times 64} \times Q_{64 \times 48} = Z_{32 \times 48}$). This vast search space renders most conventional optimization methods (e.g., particle swarm optimization, reinforcement learning and Monte Carlo tree search) inefficient. To address this challenge, we propose an evolution strategy-based sparse tensor accelerator optimization framework, called SparseMap. SparseMap constructing a more comprehensive design space with the consideration of both mapping and sparse strategy. We introduce a series of enhancements to genetic encoding and evolutionary operators, enabling SparseMap to efficiently explore the vast and diverse design space. We quantitatively compare SparseMap with prior works and classical optimization methods, demonstrating that SparseMap consistently finds superior solutions.

artificial intelligence, evolutionary algorithm, machine learning, (18 more...)

arXiv.org Artificial Intelligence

2508.12906

Country: Asia > China (0.28)

Genre: Research Report (0.64)

Industry:

Information Technology (0.47)
Health & Medicine > Pharmaceuticals & Biotechnology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

887caadc3642e304ede659b734f79b00-Paper.pdf

Neural Information Processing SystemsAug-15-2025, 00:41:54 GMT

latent variable, proc, sparsemax, (16 more...)

Neural Information Processing Systems

Country:

Europe > Portugal > Lisbon > Lisbon (0.05)
Asia > Middle East > Jordan (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

preliminary experiments with VIMCO which does not seem to outperform moving average baseline on the bit-vector

Neural Information Processing SystemsAug-15-2025, 00:41:42 GMT

We thank all reviewers for their comments. In 5.1, we compare against Sum&Sample, which was shown in prior work [Figure 1 of Liu et al., 2019] Upon acceptance we will normalize these baselines for all experiments and include suggested ones. Thanks for bringing this up, scalability is an important point that we want to make sure is clear in the final version. We will make this clearer. We will include this analysis and plots as suggested.

average baseline, preliminary experiment, sparsemax, (14 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.31)

Add feedback

Hopfield-Fenchel-Young Networks: A Unified Framework for Associative Memory Retrieval

Santos, Saul, Niculae, Vlad, McNamee, Daniel, Martins, André F. T.

arXiv.org Artificial IntelligenceNov-13-2024

Associative memory models, such as Hopfield networks and their modern variants, have garnered renewed interest due to advancements in memory capacity and connections with self-attention in transformers. In this work, we introduce a unified framework-Hopfield-Fenchel-Young networks-which generalizes these models to a broader family of energy functions. Our energies are formulated as the difference between two Fenchel-Young losses: one, parameterized by a generalized entropy, defines the Hopfield scoring mechanism, while the other applies a post-transformation to the Hopfield output. By utilizing Tsallis and norm entropies, we derive end-to-end differentiable update rules that enable sparse transformations, uncovering new connections between loss margins, sparsity, and exact retrieval of single memory patterns. We further extend this framework to structured Hopfield networks using the SparseMAP transformation, allowing the retrieval of pattern associations rather than a single pattern. Our framework unifies and extends traditional and modern Hopfield networks and provides an energy minimization perspective for widely used post-transformations like $\ell_2$-normalization and layer normalization-all through suitable choices of Fenchel-Young losses and by using convex analysis as a building block. Finally, we validate our Hopfield-Fenchel-Young networks on diverse memory recall tasks, including free and sequential recall. Experiments on simulated data, image retrieval, multiple instance learning, and text rationalization demonstrate the effectiveness of our approach.

hopfield network, retrieval, transformation, (16 more...)

arXiv.org Artificial Intelligence

2411.0859

Country:

North America > United States (0.28)
Europe > Portugal > Lisbon > Lisbon (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
(2 more...)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Sparse and Structured Hopfield Networks

Santos, Saul, Niculae, Vlad, McNamee, Daniel, Martins, Andre F. T.

arXiv.org Artificial IntelligenceJun-4-2024

Modern Hopfield networks have enjoyed recent interest due to their connection to attention in transformers. Our paper provides a unified framework for sparse Hopfield networks by establishing a link with Fenchel-Young losses. The result is a new family of Hopfield-Fenchel-Young energies whose update rules are end-to-end differentiable sparse transformations. We reveal a connection between loss margins, sparsity, and exact memory retrieval. We further extend this framework to structured Hopfield networks via the SparseMAP transformation, which can retrieve pattern associations instead of a single pattern. Experiments on multiple instance learning and text rationalization demonstrate the usefulness of our approach.

hopfield network, retrieval, transformation, (13 more...)

arXiv.org Artificial Intelligence

2402.13725

Country:

Europe > Portugal > Lisbon > Lisbon (0.14)
North America > United States (0.14)
Europe > Austria > Vienna (0.14)
(2 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Therapeutic Area (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models (1.00)

Add feedback

DAG Learning on the Permutahedron

Zantedeschi, Valentina, Franceschi, Luca, Kaddour, Jean, Kusner, Matt J., Niculae, Vlad

arXiv.org Artificial IntelligenceFeb-10-2023

We propose a continuous optimization framework for discovering a latent directed acyclic graph (DAG) from observational data. Our approach optimizes over the polytope of permutation vectors, the so-called Permutahedron, to learn a topological ordering. Edges can be optimized jointly, or learned conditional on the ordering via a non-differentiable subroutine. Compared to existing continuous optimization approaches our formulation has a number of advantages including: 1. validity: optimizes over exact DAGs as opposed to other relaxations optimizing approximate DAGs; 2. modularity: accommodates any edge-optimization procedure, edge structural parameterization, and optimization loss; 3. end-to-end: either alternately iterates between node-ordering and edge-optimization, or optimizes them jointly. We demonstrate, on real-world data problems in protein-signaling and transcriptional network discovery, that our approach lies on the Pareto frontier of two key metrics, the SID and SHD. In many domains, including cell biology (Sachs et al., 2005), finance (Sanford & Moosa, 2012), and genetics (Zhang et al., 2013), the data generating process is thought to be represented by an underlying directed acylic graph (DAG). Many models rely on DAG assumptions, e.g., causal modeling uses DAGs to model distribution shifts, ensure predictor fairness among subpopulations, or learn agents more sample-efficiently (Kaddour et al., 2022). A key question, with implications ranging from better modeling to causal discovery, is how to recover this unknown DAG from observed data alone. Learning DAGs from observational data alone is fundamentally difficult for two reasons. This riddles the search space with local minima; (ii) Computation: DAG discovery is a costly combinatorial optimization problem over an exponentially large solution space and subject to global acyclicity constraints. To address issue (ii), recent work has proposed continuous relaxations of the DAG learning problem.

artificial intelligence, machine learning, permutation, (18 more...)

arXiv.org Artificial Intelligence

2301.11898

Country:

North America > United States > Florida > Monroe County > Key West (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Netherlands > North Holland > Amsterdam (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.87)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)

Add feedback

Discrete Latent Structure in Neural Networks

Niculae, Vlad, Corro, Caio F., Nangia, Nikita, Mihaylova, Tsvetomila, Martins, André F. T.

arXiv.org Artificial IntelligenceJan-18-2023

Many types of data from fields including natural language processing, computer vision, and bioinformatics, are well represented by discrete, compositional structures such as trees, sequences, or matchings. Latent structure models are a powerful tool for learning to extract such representations, offering a way to incorporate structural bias, discover insight about the data, and interpret decisions. However, effective training is challenging, as neural networks are typically designed for continuous computation. This text explores three broad strategies for learning with discrete latent structure: continuous relaxation, surrogate gradients, and probabilistic estimation. Our presentation relies on consistent notations for a wide range of models. As such, we reveal many new connections between latent structure learning strategies, showing how most consist of the same small set of fundamental building blocks, but use them differently, leading to substantially different applicability and properties.

artificial intelligence, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2301.07473

Country: