AITopics | Overview

Collaborating Authors

Overview

Local LMO: Constrained Gradient Optimization via a Local Linear Minimization Oracle

Richtárik, Peter, Gruntkowska, Kaja, Li, Hanmin

arXiv.org Machine LearningMay-12-2026

We design Local LMO - a new projection-free gradient-type method for constrained optimization. The key algorithmic idea is to replace the global linear minimization oracle over the constraint set used by Frank-Wolfe (FW) with a local linear minimization oracle over the intersection of the constraint set and a "small" ball centered at the current iterate. In particular, when minimizing $f:\mathbb{R}^d\to \mathbb{R}$ over a constraint $\emptyset\neq\mathcal{X}\subseteq\mathbb{R}^d$, Local LMO performs the iteration \[x_{k+1}\in \arg\min_{z\in\mathcal{X}\cap\mathcal{B}(x_{k},t_k)}\langle\nabla f(x_{k}), z \rangle,\] where $x_0\in\mathcal{X}$, and $t_k>0$ is a suitably chosen radius which can be interpreted as an effective stepsize. While designed as an alternative to FW, Local LMO is perhaps best viewed as a generalization of Gradient Descent (GD) rather than a modification of FW. Indeed, it is easy to see that Local LMO reduces to GD in the unconstrained setting and, more generally, to GD restricted to an affine subspace if the constraint $\mathcal{X}$ is affine. We prove that this simple algorithmic scheme transfers the known (unaccelerated) convergence rates of Projected Gradient Descent (PGD) to the projection-free world in several important regimes, some of which are beyond the reach of FW. In contrast to FW theory, i) our guarantees hold without requiring the feasible set $\mathcal{X}$ to be bounded, ii) our theory does not require the "curvature" assumption, which allows us to establish a standard sublinear rate for convex functions with bounded gradients, iii) we obtain a linear rate in the smooth strongly convex regime. Furthermore, we obtain sharp sublinear rates in the smooth convex and non-convex regimes, in the $(L_0,L_1)$-smooth convex regime, and in stochastic and non-differentiable settings.

constraint, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2605.0885

Country:

North America > United States (0.67)
Asia (0.45)
Europe (0.45)

Genre:

Research Report (1.00)
Overview (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.92)
Information Technology > Artificial Intelligence > Natural Language (0.92)

Add feedback

An Interpretable and Scalable Framework for Evaluating Large Language Models

Qu, Xinhao, Heng, Qiang, Zeng, Hao, Liu, Xiaoqian

arXiv.org Machine LearningMay-11-2026

Evaluation of large language models (LLMs) is increasingly critical, yet standard benchmarking methods rely on average accuracy, overlooking both the inherent stochasticity of LLM outputs and the heterogeneity of benchmark items. Item Response Theory (IRT) offers a principled framework for modeling latent model abilities and item characteristics, but conventional methods are computationally expensive and numerically unstable, limiting large-scale implementations. To address these challenges, we propose an interpretable and scalable framework for LLM evaluation based on the majorization-minimization principle. Our approach reformulates the problem as a sequence of constrained matrix factorization subproblems, enabling stable and efficient parameter estimation with theoretical guarantees for identifiability and convergence. Experiments on synthetic and real-world datasets, including MATH-500 and six Open LLM Leaderboard benchmarks, demonstrate that our method achieves superior scalability and interpretability. It delivers orders-of-magnitude speedups over competing methods while maintaining comparable or even higher estimation accuracy. Our results align with established scaling laws and offer insights into item difficulty and discrimination, informing more principled benchmark design.

arxiv preprint arxiv, large language model, machine learning, (17 more...)

arXiv.org Machine Learning

2605.07046

Country:

North America > Mexico (0.28)
Europe > Austria (0.28)

Genre:

Overview (0.92)
Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Ratio-based Loss Functions

Helgerth, Lena, Christmann, Andreas

arXiv.org Machine LearningMay-8-2026

Algorithms in machine learning and AI do critically depend on at least three key components: (i) the risk function, which is the expectation of the loss function, (ii) the function space, which is often called the hypothesis space, and (iii) the set of probability measures, which are allowed for the specified algorithm. This paper gives a survey of a certain class of loss functions, which we call ratio-based. In supervised learning, margin-based loss functions for classification tasks depending on the product of the output values $y_i$ and the predictions $f(x_i)$ as well as distance-based loss functions depending on the difference of $y_i$ and $f(x_i)$ for regression are common. Distance-based loss functions are in particular useful, if an additive model assumption seems plausible, i.e. the common signal plus noise assumption. However, in the literature, several loss functions proposed for regression purposes have a multiplicative error structure in mind and pay attention to relative errors, i.e. to the ratio of $y_i$ and $f(x_i)$. In this survey article, we systematically investigate such ratio-based loss functions and propose a few new losses, which may be interesting for future research. We concentrate on investigating general properties of ratio-based loss functions like continuity, Lipschitz-continuity, convexity, and differentiability, because these properties play a central role in most machine learning algorithms. Therefore, we do not focus on some specific machine learning algorithm to derive universal consistency, learning rates, or stability results. Instead, we want to enable future research in this direction.

artificial intelligence, machine learning, survey article, (19 more...)

arXiv.org Machine Learning

2605.05808

Country: Europe (0.28)

Genre: Overview (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.67)

Add feedback

Microsoft, Google, xAI give US access to AI models for security testing

Al JazeeraMay-5-2026, 16:53:09 GMT

Tech giants Microsoft, Google and xAI say they will allow the United States federal government access to their new artificial intelligence models for national security testing. The Center for AI Standards and Innovation (CAISI) at the Department of Commerce announced the agreement on Tuesday amid increasing concerns about the capabilities that Anthropic's newly unveiled Mythos model could give hackers. The agreement fulfils a pledge the administration of US President Donald Trump made in July to partner with technology companies to vet their AI models for "national security risks". Microsoft will work with US government scientists to test AI systems "in ways that probe unexpected behaviors", the company said in a statement. Together they will develop shared data sets and workflows for testing the company's models, the company said.

artificial intelligence, social media, survey article, (11 more...)

Al Jazeera

Country: North America > United States (1.00)

Genre: Overview > Growing Problem (0.36)

Industry:

Information Technology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Government > Military (1.00)

Technology:

Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (0.41)
Information Technology > Communications > Social Media (0.31)

Add feedback

Extrapolation in Statistical Learning with Extreme Value Theory

Engelke, Sebastian, Gnecco, Nicola, Sabourin, Anne

arXiv.org Machine LearningMay-5-2026

Extreme value theory provides rigorous theory and statistical tools for extrapolation in machine learning, particularly in settings where traditional methods struggle due to data scarcity in the tails. A broad range of tasks benefit from these advances, including regression and classification beyond the training data, extreme quantile regression, supervised and unsupervised dimension reduction, generative artificial intelligence and anomaly detection. This review synthesizes recent developments in these fields at the intersection of statistical learning and extreme value theory, with a focus on principled methods based on asymptotically motivated representations of the tail of univariate and multivariate distributions. We consider different theoretical frameworks for both asymptotically dependent and independent data and discuss how they translate into efficient statistical methods for extrapolation to extreme regions. By addressing both theoretical and practical aspects, we offer a comprehensive overview of the state-of-the-art in this quickly evolving field, and identify promising directions for future research.

artificial intelligence, data mining, machine learning, (20 more...)

arXiv.org Machine Learning

2605.01909

Country: Europe (1.00)

Genre: Overview (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning > Generative AI (0.34)

Add feedback

The White House is considering tighter regulation of new AI models

EngadgetMay-4-2026, 21:23:33 GMT

A federal review of new AI models ahead of their public release is being considered as a possible power for that committee, according to the publication's sources. No clear approach has been decided, but the suggested it could mimic what's currently happening within the UK government, where multiple layers of oversight confirm that AI models meet safety standards. There's also still a chance the entire concept fizzles and comes to nothing. If an oversight group is created, it would mark quite a reversal from the hands-off attitude presented in the White House's previously introduced AI Action Plan. The plan appeared willing to offer the AI companies most of the concessions they wanted, although it did leave a lot of potential to create plenty of new problems .

artificial intelligence, social media, survey article, (8 more...)

Engadget

Country: North America > United States (0.72)

Genre: Overview (0.59)

Industry:

Leisure & Entertainment > Games > Computer Games (0.81)
Government > Regional Government > North America Government > United States Government (0.72)

Technology:

Information Technology > Artificial Intelligence (1.00)
Information Technology > Communications > Mobile (0.58)
Information Technology > Communications > Social Media (0.46)

Add feedback

0dc91de822b71c66a7f54fa121d8cbb9-Paper-Datasets_and_Benchmarks.pdf

Neural Information Processing SystemsMay-1-2026, 06:26:46 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Country:

Europe (1.00)
North America (0.93)
Asia > India (0.72)

Genre:

Questionnaire & Opinion Survey (0.68)
Overview (0.46)

Industry: Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Communications (0.68)
Information Technology > Artificial Intelligence > Natural Language > Generation (0.68)

Add feedback

Improving Diffusion-Based Image Synthesis with Context Prediction

Neural Information Processing SystemsMay-1-2026, 03:57:25 GMT

Diffusion models are a new class of generative models, and have dramatically promoted image generation with unprecedented quality and diversity. Existing diffusion models mainly try to reconstruct input image from a corrupted one with a pixel-wise or feature-wise constraint along spatial axes. However, such point-based reconstruction may fail to make each predicted pixel/feature fully preserve its neighborhood context, impairing diffusion-based image synthesis. As a powerful source of automatic supervisory signal, context has been well studied for learning representations. Inspired by this, we for the first time propose CONPREDIFF to improve diffusion-based image synthesis with context prediction.

artificial intelligence, diffusion model, machine learning, (14 more...)

Neural Information Processing Systems

Genre: Overview (0.46)

Industry: Media (0.46)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

Probabilistic Attention for Interactive Segmentation

Neural Information Processing SystemsMay-1-2026, 01:53:12 GMT

We provide a probabilistic interpretation of attention and show that the standard dotproduct attention in transformers is a special case of Maximum APosteriori (MAP) inference. The proposed approach suggests the use of Expectation Maximization algorithms for online adaptation of key and value model parameters. This approach is useful for cases in which external agents, e.g., annotators, provide inference-time information about the correct values of some tokens, e.g., the semantic category of some pixels, and we need for this new information to propagate to other tokens in a principled manner. We illustrate the approach on an interactive semantic segmentation task in which annotators and models collaborate online to improve annotation efficiency. Using standard benchmarks, we observe that key adaptation boosts model performance ( 10% mIoU) in the low feedback regime and value propagation improves model responsiveness in the high feedback regime.

computer vision, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Overview (0.46)

Technology: