AITopics | taylor expansion

Collaborating Authors

taylor expansion

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Enhanced Cyclic Coordinate Descent Methods for Elastic Net Penalized Linear Models

Neural Information Processing SystemsJun-22-2026, 17:57:14 GMT

We present a novel enhanced cyclic coordinate descent (ECCD) framework for solving generalized linear models with elastic net constraints that reduces training time in comparison to existing state-of-the-art methods. We redesign the CD method by performing a Taylor expansion around the current iterate to avoid nonlinear operations arising in the gradient computation. By introducing this approximation we are able to unroll the vector recurrences occurring in the CD method and reformulate the resulting computations into more efficient batched computations. We show empirically that the recurrence can be unrolled by a tunable integer parameter, s, such that s > 1 yields performance improvements without affecting convergence, whereas s= 1 yields the original CD method. A key advantage of ECCD is that it avoids the convergence delay and numerical instability exhibited by block coordinate descent. Finally, we implement our proposed method in C++ using Eigen to accelerate linear algebra computations. Comparison of our method against existing state-of-the-art solvers show consistent performance improvements of 3 in average for regularization path variant on diverse benchmark datasets. Our implementation is available at https://github.

artificial intelligence, dataset, machine learning, (17 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre:

Research Report > Experimental Study (1.00)
Research Report > New Finding (0.93)

Industry:

Health & Medicine (1.00)
Energy (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

On the Geometry of Separation in Finite Gaussian Mixtures

Nguyen, Huy, Le, Dung, Rinaldo, Alessandro, Ho, Nhat

arXiv.org Machine LearningJun-16-2026

We study an open problem of understanding the effects of the minimum component separation on the convergence rates of parameter estimation in finite Gaussian mixtures. We address this by developing a unified geometric framework based on novel Hellinger lower bounds that directly relate discrepancies between mixture densities directly to Wasserstein distances between their underlying mixing measures, with explicit dependence on both the minimum separation and the minimum weight. Our approach combines carefully designed interpolation polynomials with confluent divided difference techniques to construct specialized moment-extraction test functions. When the number of components is known, these bounds uncover a localization phenomenon: the separation complexity is driven strictly by the spatial configuration of mixture components, namely, whether they are concentrated in a single cluster, partitioned into multiple clusters separated by a macroscopic gap, or arranged without any structural constraints. On the other hand, when the number of components becomes unknown and is over-specified, the separation complexity is slightly reduced, while the minimum mixture weight disappears entirely from the convergence rates due to a transition from first-order to second-order Wasserstein geometry. As a consequence, we obtain separation-dependent convergence rates that continuously interpolate between point-wise and uniform estimation regimes, thereby settling the fundamental limits of parameter recovery in finite Gaussian mixtures.

artificial intelligence, equation, machine learning, (19 more...)

arXiv.org Machine Learning

2606.16179

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Data Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Add feedback

Renewable Lasso without Batch-Number Constraints: A Gradient-Enhanced Approach

Gao, Junzhuo, Peng, Ling, Guo, Xu, Lian, Heng

arXiv.org Machine LearningJun-11-2026

We study online estimation for high-dimensional generalized linear models with streaming data. First, for the non-distributed setting, we propose a gradient-enhanced surrogate loss that approximates the cumulative loss using only historical summaries, which modifies and improves upon the existing renewable estimation approach for the same model in the high-dimensional setting, and removes the batch-number constraint in previous studies. We then extend the method to distributed streaming data under the master-client architecture, where batches are partitioned across sites and only summaries (gradient vectors) are exchanged. Instead of directing applying the popular method of Jordan et al. (2019) to the surrogate quadratic loss, our adjusted approach does not require the clients to compute the full surrogate loss. We derive non-asymptotic error bounds under the high-dimensional scaling, without the stringent constraint on the number of batches in the previous studies. Simulation results under linear and logistic models, together with a real-data application, show improved accuracy over existing renewable estimators.

artificial intelligence, machine learning, pkk, (17 more...)

arXiv.org Machine Learning

2606.11738

Country:

Asia > China (0.93)
Asia > Middle East > Jordan (0.25)

Genre: Research Report (0.82)

Industry: Information Technology (0.46)

Technology:

Information Technology > Communications (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Data Science (0.93)

Add feedback

Supplementary to " Approximation with CNNs in Sobolev Space: with Applications to Classification "

Neural Information Processing SystemsApr-24-2026, 17:08:08 GMT

In the Supplementary materials, we include detailed descriptions on convex surrogate losses,convolutional neural networks, non-asymptotic error bounds for commonly used loss functions, and prove Theorems 2.1,2.2, A toy example on the numerical performance of CNN approximation is presented in Appendix D. We next give a brief review of the convex surrogate loss functions and discuss in details on the connection between the excess risk with respect to the ϕ-loss and that of 0-1 loss [28, 4]. Let ϕbe a given convex univariate function ϕ: R [0,). Instead of minimizing the excess risk R over H, we consider minimizing the risk with respect to the loss ϕ(ϕ-risk) R(f):= E{ϕ(Yf(X))} over a certain class of functions F, where ϕ: R [0,) is some generic loss function. For the special case when H = {h: h(x) = sign(f(x)),f F} and ϕ() is a step function, i.e., ϕ(x) = 1 Guohao Shen and Yuling Jiao contributed equally to this work Corresponding authors 36th Conference on Neural Information Processing Systems (NeurIPS 2022). As shown in [28] and [4], for a properly chosen ϕ, ˆfn can indeed help reduce the 0-1 excess risk R (ˆhn) R (h0). More precisely, let R0:= inff measurable R(f), then for a proper ϕ, we have ψ(R (ˆhn) R (h0)) R(ˆfn) R(f0), where ψ: [ 1,1] [0,)is a nonnegative continuous function, invertible on [0,1], and achieves its minimum at 0 with ψ(0) = 0. A wide variety of popular classification methods are based on this tactic.

artificial intelligence, machine learning, smin, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.46)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

036912a83bdbb1fd792baf6532f102d8-Supplemental-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 06:24:59 GMT

artificial intelligence, expansion, machine learning, (17 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.47)

Add feedback

Energy Score-Guided Neural Gaussian Mixture Model for Predictive Uncertainty Quantification

Yang, Yang, Ji, Chunlin, Li, Haoyang, Deng, Ke

arXiv.org Machine LearningMar-31-2026

Quantifying predictive uncertainty is essential for real world machine learning applications, especially in scenarios requiring reliable and interpretable predictions. Many common parametric approaches rely on neural networks to estimate distribution parameters by optimizing the negative log likelihood. However, these methods often encounter challenges like training instability and mode collapse, leading to poor estimates of the mean and variance of the target output distribution. In this work, we propose the Neural Energy Gaussian Mixture Model (NE-GMM), a novel framework that integrates Gaussian Mixture Model (GMM) with Energy Score (ES) to enhance predictive uncertainty quantification. NE-GMM leverages the flexibility of GMM to capture complex multimodal distributions and leverages the robustness of ES to ensure well calibrated predictions in diverse scenarios. We theoretically prove that the hybrid loss function satisfies the properties of a strictly proper scoring rule, ensuring alignment with the true data distribution, and establish generalization error bounds, demonstrating that the model's empirical performance closely aligns with its expected performance on unseen data. Extensive experiments on both synthetic and real world datasets demonstrate the superiority of NE-GMM in terms of both predictive accuracy and uncertainty quantification.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Machine Learning

2603.27672

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
North America > United States > California > Alameda County > Hayward (0.04)
Asia > China > Tianjin Province > Tianjin (0.04)
(3 more...)

Genre: Research Report > New Finding (0.46)

Industry: Banking & Finance > Trading (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback

QT-ViT: Improving Linear Attention in ViT with Quadratic Taylor Expansion

Neural Information Processing SystemsMar-21-2026, 17:08:15 GMT

Vision transformer model (ViT) is widely used and performs well in vision tasks due to its ability to capture long-range dependencies. However, the time complexity and memory consumption increase quadratically with the number of input patches which limits the usage of ViT in real-world applications. Previous methods have employed linear attention to mitigate the complexity of the original self-attention mechanism at the expense of effectiveness. In this paper, we propose QT-ViT models that improve the previous linear self-attention using quadratic Taylor expansion. Specifically, we substitute the softmax-based attention with second-order Taylor expansion, and then accelerate the quadratic expansion by reducing the time complexity with a fast approximation algorithm. The proposed method capitalizes on the property of quadratic expansion to achieve superior performance while employing linear approximation for fast inference. Compared to previous studies of linear attention, our approach does not necessitate knowledge distillation or high-order attention residuals to facilitate the training process. Extensive experiments demonstrate the efficiency and effectiveness of the proposed QT-ViTs, showcasing the state-of-the-art results. Particularly, the proposed QT-ViTs consistently surpass the previous SOTA EfficientViTs under different model sizes, and achieve a new Pareto-front in terms of accuracy and speed.

artificial intelligence, machine learning, proceedings, (9 more...)

Neural Information Processing Systems

Technology: