AITopics | Statistical Learning

3ce3bd7d63a2c9c81983cc8e9bd02ae5-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 13:14:16 GMT

We start by restating the setup in which our algorithm operates. The type of ICA considered in our work assumes the following generative model. There are dsources recorded T times forming the columns of S:= [s1,...,sT] Rd T whose components s1t,...,sdt are assumed non-Gaussian and independent. Without loss of generality, we assume that each source has zero-mean, unit variance, and finite and distinct kurtosis, a common assumption among kurtosis-based ICA methods [12]. The kurtosis of a random variable v is defined as kurt[v] = E (v E(v))4 / E (v E(v))2 2. Finally, sources are assumed to be mixed through a linear system, i.e., there exists a full rank mixing matrix, A Rd d, producing the d-dimensional mixture, xt, expressed as xt = Ast t {1,...,T} .

algorithm, artificial intelligence, machine learning, (17 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

3ce3bd7d63a2c9c81983cc8e9bd02ae5-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 13:14:13 GMT

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Country: Asia > Middle East (0.14)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.95)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.69)

Add feedback

3c63ec7be1b6c49e6c308397023fd8cd-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 12:57:00 GMT

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.30)

Add feedback

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning

Neural Information Processing SystemsApr-25-2026, 12:56:56 GMT

We present and analyze an algorithm for optimizing smooth and convex or strongly convex objectives using minibatch stochastic gradient estimates. The algorithm is optimal with respect to its dependence on both the minibatch size and minimum expected loss simultaneously. This improves over the optimal method of Lan [17], which is insensitive to the minimum expected loss; over the optimistic acceleration of Cotter et al. [10], which has suboptimal dependence on the minibatch size; and over the algorithm of Liu and Belkin [19], which is limited to least squares problems and is also similarly suboptimal with respect to the minibatch size.

algorithm, artificial intelligence, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.38)

Add feedback

3c057cb2b41f22c0e740974d7a428918-Paper.pdf

Neural Information Processing SystemsApr-25-2026, 12:56:35 GMT

artificial intelligence, latent space, machine learning, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.98)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)

Add feedback

Demographic Parity Constrained Minimax Optimal Regression under Linear Model

Neural Information Processing SystemsApr-25-2026, 12:55:48 GMT

We explore the minimax optimal error associated with a demographic parityconstrained regression problem within the context of a linear model. Our proposed model encompasses a broader range of discriminatory bias sources compared to the model presented by Chzhen and Schreuder [6]. Our analysis reveals that the minimax optimal error for the demographic parity-constrained regression problem under our model is characterized by Θ(dM/n), where ndenotes the sample size, d represents the dimensionality, and M signifies the number of demographic groups arising from sensitive attributes. Moreover, we demonstrate that the minimax error increases in conjunction with a larger bias present in the model.

artificial intelligence, estimator, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.45)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Law > Civil Rights & Constitutional Law (0.67)
Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

Demographic Parity Constrained Minimax Optimal Regression under Linear Model

Neural Information Processing SystemsApr-25-2026, 12:55:45 GMT

We explore the minimax optimal error associated with a demographic parityconstrained regression problem within the context of a linear model. Our proposed model encompasses a broader range of discriminatory bias sources compared to the model presented by Chzhen and Schreuder [6]. Our analysis reveals that the minimax optimal error for the demographic parity-constrained regression problem under our model is characterized by Θ(dM/n), where ndenotes the sample size, d represents the dimensionality, and M signifies the number of demographic groups arising from sensitive attributes. Moreover, we demonstrate that the minimax error increases in conjunction with a larger bias present in the model.

artificial intelligence, estimator, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe (0.45)
North America > United States (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Law > Civil Rights & Constitutional Law (0.67)
Information Technology (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.47)

Add feedback

3bbca1d243b01b47c2bf42b29a8b265c-Supplemental.pdf

Neural Information Processing SystemsApr-25-2026, 12:43:35 GMT

machine learning, natural language, xr-transformer, (16 more...)

Neural Information Processing Systems

Industry: Information Technology > Services (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Data Science (0.94)
(2 more...)

Add feedback

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

Neural Information Processing SystemsApr-25-2026, 12:43:31 GMT

Extreme multi-label text classification (XMC) seeks to find relevant labels from an extreme large label collection for a given text input. Many real-world applications can be formulated as XMC problems, such as recommendation systems, document tagging and semantic search. Recently, transformer based XMC methods, such as XTransformer and LightXML, have shown significant improvement over other XMC methods. Despite leveraging pre-trained transformer models for text representation, the fine-tuning procedure of transformer models on large label space still has lengthy computational time even with powerful GPUs. In this paper, we propose a novel recursive approach, XR-Transformer to accelerate the procedure through recursively fine-tuning transformer models on a series of multi-resolution objectives related to the original XMC objective function. Empirical results show that XRTransformer takes significantly less training time compared to other transformerbased XMC models while yielding better state-of-the-art results. In particular, on the public Amazon-3M dataset with 3 million labels, XR-Transformer is not only 20x faster than X-Transformer but also improves the Precision@1 from 51% to 54%. Our code is publicly available at https://github.com/amzn/pecos.

classification, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Industry:

Information Technology > Services (0.48)
Retail > Online (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

lower bound

Neural Information Processing SystemsApr-25-2026, 12:42:56 GMT

While there remains a small gap between our main lower bound of Theorem 3 and the deterministic quantised gradient descent of Section 6, we can show that the gap cannot be closed by improved deterministic algorithms where the coordinator learns value of objective function F(x) in addition to the minimiser x. That is, our quantised gradient descent is the communication-optimal deterministic algorithm for variant (1) for objectives with constant condition number. Recall that in the N-player equality over universe of size d, denoted by EQd,N, each player i is given an input bi 2{ 0,1}d, and the task is to decide if all players have the same input. It is known [33] that the deterministic communication complexity of EQd,N is CC(EQd,N)= ( Nd). Theorem 8. Given parameters N, d, ", 0 and = 0N satisfying d /" = (1), any deterministic protocol solving (1) for quadratic input functions x 7! 0kx x0k22 has communication complexity Nd log( d/"), if the coordinator is also required to output estimate r 2 R for the minimum function value such that Assume is a deterministic protocol solving (1) with communication complexity C .We show that can then solve N-party equality over a universe of size D = ( dlog( d/")), implying C = ( ND)= Nd log( d/") . More specifically, let S be the set given by Lemma 2 with =(2 "/)1/2, and let D = dlog|S|e = (dlog( d/")). Note that since we assume d /" = (1), the set S has at least two elements and D 1.

artificial intelligence, gradient descent, machine learning, (15 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.57)

Add feedback

Filters

Collaborating Authors

Statistical Learning

3ce3bd7d63a2c9c81983cc8e9bd02ae5-Supplemental.pdf

3ce3bd7d63a2c9c81983cc8e9bd02ae5-Paper.pdf

3c63ec7be1b6c49e6c308397023fd8cd-Supplemental.pdf

An Even More Optimal Stochastic Optimization Algorithm: Minibatching and Interpolation Learning

3c057cb2b41f22c0e740974d7a428918-Paper.pdf

Demographic Parity Constrained Minimax Optimal Regression under Linear Model

Demographic Parity Constrained Minimax Optimal Regression under Linear Model

3bbca1d243b01b47c2bf42b29a8b265c-Supplemental.pdf

Fast Multi-Resolution Transformer Fine-tuning for Extreme Multi-label Text Classification

lower bound