AITopics | leverage score

Collaborating Authors

leverage score

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

AUnified Framework for Provably Efficient Algorithms to Estimate Shapley Values

Neural Information Processing SystemsJun-23-2026, 02:17:26 GMT

Shapley values have emerged as a critical tool for explaining which features impact the decisions made by machine learning models. However, computing exact Shapley values is difficult, generally requiring an exponential (in the feature dimension) number of model evaluations. To address this, many model-agnostic randomized estimators have been developed, the most influential and widely used being the KernelSHAP method (Lundberg & Lee, 2017). While related estimators such as unbiased KernelSHAP (Covert & Lee, 2021) and LeverageSHAP (Musco & Witter, 2025) are known to satisfy theoretical guarantees, bounds for KernelSHAP have remained elusive. We describe a broad and unified framework that encompasses KernelSHAP and related estimators constructed using both with and without replacement sampling strategies.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry:

Banking & Finance (0.92)
Health & Medicine (0.67)
Information Technology > Security & Privacy (0.67)
Law (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Game Theory (0.93)
(3 more...)

Add feedback

Faster Algorithms for Structured John Ellipsoid Computation

Neural Information Processing SystemsJun-22-2026, 23:36:59 GMT

The famous theorem of Fritz John states that any convex body has a unique maximal volume inscribed ellipsoid, known as the John Ellipsoid. Computing the John Ellipsoid is a fundamental problem in convex optimization. In this paper, we focus on approximating the John Ellipsoid inscribed in a convex and centrally symmetric polytope defined by P:= {x Rd: 1n Ax 1n}, where A Rn d is a rank-d matrix and 1n Rn is the all-ones vector. We develop two efficient algorithms for approximating the John Ellipsoid. The first is a sketchingbased algorithm that runs in nearly input-sparsity time eO(nnz(A)+dω), where nnz(A)denotes the number of nonzero entries in the matrix Aand ω 2.37is the current matrix multiplication exponent. The second is a treewidth-based algorithm that runs in time eO(nτ2), where τ is the treewidth of the dual graph of the matrix A. Our algorithms significantly improve upon the state-of-the-art running time of eO(nd2)achieved by [Cohen, Cousins, Lee, and Yang, COLT 2019].

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California (0.28)
North America > United States > New York (0.28)

Genre:

Research Report > Experimental Study (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.68)

Add feedback

Value-Guided KVCompression for LLMs via Approximated CURDecomposition

Neural Information Processing SystemsJun-18-2026, 10:46:15 GMT

Key-value (KV) cache compression has emerged as a critical technique for reducing the memory and latency overhead of autoregressive language models during inference. Prior approaches predominantly rely on query-key attention scores to rank and evict cached tokens, assuming that attention intensity correlates with semantic importance. However, this heuristic overlooks the contribution of value vectors, which directly influence the attention output. In this paper, we propose CurDKV, a novel, value-centric KV compression method that selects keys and values based on leverage scores computed from CUR matrix decomposition. Our approach approximates the dominant subspace of the attention output softmax(QK)V, ensuring that the retained tokens best preserve the model's predictive behavior. Theoretically, we show that attention score approximation does not guarantee output preservation, and demonstrate that CUR-based selection minimizes end-to-end attention reconstruction loss.

large language model, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia (0.45)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.67)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Debiasing Random Oblique Projections for Subsampled OLS and Fast CUR in High Dimensions

Niu, Chengmei, Garg, Sachin, Dereziński, Michał, Liao, Zhenyu

arXiv.org Machine LearningMay-26-2026

Random sampling is a fundamental tool in modern machine learning and numerical linear algebra for reducing the computational cost of large-scale matrix problems. Existing analyses, however, rely primarily on subspace embedding guarantees, which do not precisely characterize the statistical bias of nonlinear random oblique projections induced by sampling, which arises ubiquitously in subsampled least squares and fast low-rank approximation methods. Because (pseudo)inversion is nonlinear, these random oblique projections can be systematically biased even when the underlying sketch is unbiased, thereby introducing hidden bias into downstream least squares and low-rank approximation solutions. In this work, we develop a unified non-asymptotic theory for random oblique projections in high dimensions. We show that standard random sampling schemes generally induce a systematic statistical bias overlooked by classical subspace embedding-style analyses, and we propose a principled debiasing framework to correct it. We illustrate the power of the theory through two canonical applications. For subsampled least squares, we obtain sharp bias--variance characterizations, reveal previously unrecognized statistical suboptimality in widely used sampling schemes, and identify when debiasing yields provable improvements. For fast CUR decomposition, we develop a debiased approach with improved approximation accuracy. Numerical experiments further validate our theoretical findings.

artificial intelligence, machine learning, theorem 4, (19 more...)

arXiv.org Machine Learning

2605.24955

Country: North America > United States (0.92)

Genre: Research Report (1.00)

Industry:

Transportation (0.68)
Consumer Products & Services > Travel (0.46)
Government > Regional Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.48)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.34)

Add feedback

Finite Population Regression Adjustment and Non-asymptotic Guarantees for Treatment Effect Estimation

Neural Information Processing SystemsApr-30-2026, 04:33:43 GMT

The design and analysis of randomized experiments is fundamental to many areas, from the physical and social sciences to industrial settings. Regression adjustment is a popular technique to reduce the variance of estimates obtained from experiments, by utilizing information contained in auxiliary covariates. While there is a large literature within the statistics community studying various approaches to regression adjustment and their asymptotic properties, little focus has been given to approaches in the finite population setting with non-asymptotic accuracy bounds. Further, prior work typically assumes that an entire population is exposed to an experiment, whereas practitioners often seek to minimize the number of subjects exposed to an experiment, for ethical and pragmatic reasons. In this work, we study the problems of estimating the sample mean, individual treatment effects, and average treatment effect with regression adjustment. We propose approaches that use techniques from randomized numerical linear algebra to sample a subset of the population on which to perform an experiment. We give non-asymptotic accuracy bounds for our methods and demonstrate that they compare favorably with prior approaches.

artificial intelligence, estimator, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.86)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.34)

Add feedback

Training Transformers with 4-bit Integers

Neural Information Processing SystemsApr-29-2026, 03:27:06 GMT

Quantizing the activation, weight, and gradient to 4-bit is promising to accelerate neural network training. However, existing 4-bit training methods require custom numerical formats which are not supported by contemporary hardware. In this work, we propose a training method for transformers with all matrix multiplications implemented with the INT4 arithmetic. Training with an ultra-low INT4 precision is challenging. To achieve this, we carefully analyze the specific structures of activation and gradients in transformers to propose dedicated quantizers for them. For forward propagation, we identify the challenge of outliers and propose a Hadamard quantizer to suppress the outliers.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)

Add feedback

Newton-LESS: Sparsification without Trade-offs for the Sketched Newton Update

Neural Information Processing SystemsApr-24-2026, 21:10:17 GMT

In second-order optimization, a potential bottleneck can be computing the Hessian matrix of the optimized function at every iteration. Randomized sketching has emerged as a powerful technique for constructing estimates of the Hessian which can be used to perform approximate Newton steps. This involves multiplication by a random sketching matrix, which introduces a trade-off between the computational cost of sketching and the convergence rate of the optimization algorithm. A theoretically desirable but practically much too expensive choice is to use a dense Gaussian sketching matrix, which produces unbiased estimates of the exact Newton step and which offers strong problem-independent convergence guarantees. We show that the Gaussian sketching matrix can be drastically sparsified, significantly reducing the computational cost of sketching, without substantially affecting its convergence properties. This approach, called Newton-LESS, is based on a recently introduced sketching technique: LEverage Score Sparsified (LESS) embeddings. We prove that Newton-LESS enjoys nearly the same problem-independent local convergence rate as Gaussian embeddings, not just up to constant factors but even down to lower order terms, for a large class of optimization tasks. In particular, this leads to a new state-of-the-art convergence result for an iterative least squares solver. Finally, we extend LESS embeddings to include uniformly sparsified random sign matrices which can be implemented efficiently and which perform well in numerical experiments.

artificial intelligence, convergence rate, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

08fe4b20d554296e503f5a43795c78d6-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 10:53:08 GMT

A.1 Proof of Theorem 2 Consider an instance of semi-supervised active regression with the labeled dataset Xlab being empty, and with the unlabeled dataset composed of k copies of the standard basis vector ei for each i = 1,,d.

algorithm 2, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.54)

Add feedback

Semi-supervised Active Linear Regression

Neural Information Processing SystemsApr-24-2026, 10:53:05 GMT

Labeled data often comes at a high cost as it may require recruiting human labelers or running costly' experiments. At the same time, in many practical scenarios, one already has access to a partially labeled, potentially biased dataset that can help with the learning task at hand. Motivated by such settings, we formally initiate a study of semi-supervised active learning through the frame of linear regression.

artificial intelligence, machine learning, regression, (18 more...)

Neural Information Processing Systems

Country: North America > United States > California (0.28)

Genre: Research Report (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (1.00)

Add feedback