AITopics | sparsifier

Collaborating Authors

sparsifier

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

447b0408b80078338810051bb38b177f-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 15:45:58 GMT

artificial intelligence, convergence, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

Rethinking gradient sparsification as total error minimization

Neural Information Processing SystemsApr-25-2026, 15:45:54 GMT

Gradient compression is a widely-established remedy to tackle the communication bottleneck in distributed training of large deep neural networks (DNNs). Under the error-feedback framework, Top-k sparsification, sometimes with k as little as 0.1% of the gradient size, enables training to the same model quality as the uncompressed case for a similar iteration count. From the optimization perspective, we find that Top-k is the communication-optimal sparsifier given a per-iteration k element budget. We argue that to further the benefits of gradient sparsification, especially for DNNs, a different perspective is necessary -- one that moves from per-iteration optimality to consider optimality for the entire training. We identify that the total error -- the sum of the compression errors for all iterations -- encapsulates sparsification throughout training.

artificial intelligence, compressor, machine learning, (16 more...)

Neural Information Processing Systems

Country: North America > Canada (0.28)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.89)

Add feedback

ANotation and Preliminaries

Neural Information Processing SystemsApr-24-2026, 19:12:57 GMT

We use the notation G= (V,E) to represent unweighted graphs, and G= (V,E,w) for weighted graphs. We use lowercase letters u,v to refer to vertices in V, and given a vertex v, we use dG(v) to refer to its degree in graph G. We use capital letters S,T to represent subsets of vertices, and given a vertex set S V, we use |S|to refer to its cardinality, S:= V \S to refer to its complement, and G[S] to refer to the subgraph of Ginduced by vertex set S. Furthermore, given two disjoint vertex sets S,T, we use wG(S,T):= P Given a graph G = (V,E), we use T to refer to a hierarchical clustering (tree) of the vertex set V, and costG(T) to refer to the cost of this clustering in graph G. Without loss of generality, we restrict our attention to just full binary hierarchical clustering trees, since the optimal tree is binary [20].

artificial intelligence, graph, machine learning, (19 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.73)

Add feedback