AITopics | Search

Collaborating Authors

Search

"Search is a problem-solving technique that systematically explores a space of problem states, i.e., successive and alternative stages in the problem-solving process. Examples of problem states might include the different board configurations in a game or intermediate steps in a reasoning process. This space of alternative solutions is then searched to find an answer. Newell and Simon (1976) have argued that this is the essential basis of human problem solving. Indeed, when a chess player examines the effects of different moves or a doctor considers a number of alternative diagnoses, they are searching among alternatives."
– from Section 1.2 of Chapter One of George F. Luger's textbook, Artificial Intelligence: Structures and Strategies for Complex Problem Solving, 5th Edition (Addison-Wesley; 2005).

News Overviews Instructional Materials AI-Alerts Classics

RandAugment: Practical Automated Data Augmentation with a Reduced Search Space

Neural Information Processing SystemsFeb-5-2026, 14:42:00 GMT

Recent work on automated data augmentation strategies has led to state-of-the-art results in image classification and object detection. An obstacle to a large-scale adoption of these methods is that they require a separate and expensive search phase. A common way to overcome the expense of the search phase was to use a smaller proxy task. However, it was not clear if the optimized hyperparameters found on the proxy task are also optimal for the actual task. In this work, we rethink the process of designing automated data augmentation strategies. We find that while previous work required searching for many augmentation parameters (e.g.

artificial intelligence, practical automated data augmentation, randaugment, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Vision (0.62)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.48)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.48)

Add feedback

FasterRisk: Fast and Accurate Interpretable Risk Scores

Neural Information Processing SystemsFeb-5-2026, 07:14:16 GMT

Over the last century, risk scores have been the most popular form of predictive model used in healthcare and criminal justice. Risk scores are sparse linear models with integer coefficients; often these models can be memorized or placed on an index card. Typically, risk scores have been created either without data or by rounding logistic regression coefficients, but these methods do not reliably produce high-quality risk scores. Recent work used mathematical programming, which is computationally slow. We introduce an approach for efficiently producing a collection of high-quality risk scores learned from data. Specifically, our approach produces a pool of almost-optimal sparse continuous solutions, each with a different support set, using a beam-search algorithm. Each of these continuous solutions is transformed into a separate risk score through a star ray search, where a range of multipliers are considered before rounding the coefficients sequentially to maintain low logistic loss. Our algorithm returns all of these high-quality risk scores for the user to consider. This method completes within minutes and can be valuable in a broad variety of applications.

artificial intelligence, machine learning, risk score, (7 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.60)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.60)

Add feedback

Score-based Metropolis-Hastings for Fractional Langevin Algorithms

Aloui, Ahmed, Liao, Junyi, Hasan, Ali, Blanchet, Jose, Tarokh, Vahid

arXiv.org Machine LearningFeb-3-2026

Sampling from heavy-tailed and multimodal distributions is challenging when neither the target density nor the proposal density can be evaluated, as in $α$-stable Lévy-driven fractional Langevin algorithms. While the target distribution can be estimated from data via score-based or energy-based models, the $α$-stable proposal density and its score are generally unavailable, rendering classical density-based Metropolis--Hastings (MH) corrections impractical. Consequently, existing fractional Langevin methods operate in an unadjusted regime and can exhibit substantial finite-time errors and poor empirical control of tail behavior. We introduce the Metropolis-Adjusted Fractional Langevin Algorithm (MAFLA), an MH-inspired, fully score-based correction mechanism. MAFLA employs designed proxies for fractional proposal score gradients under isotropic symmetric $α$-stable noise and learns an acceptance function via Score Balance Matching. We empirically illustrate the strong performance of MAFLA on a series of tasks including combinatorial optimization problems where the method significantly improves finite time sampling accuracy over unadjusted fractional Langevin dynamics.

acceptance function, artificial intelligence, machine learning, (19 more...)

arXiv.org Machine Learning

2602.00835

Country:

North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.34)

Add feedback

Provably Data-driven Multiple Hyper-parameter Tuning with Structured Loss Function

Le, Tung Quoc, Nguyen, Anh Tuan, Nguyen, Viet Anh

arXiv.org Machine LearningFeb-3-2026

Data-driven algorithm design automates hyperparameter tuning, but its statistical foundations remain limited because model performance can depend on hyperparameters in implicit and highly non-smooth ways. Existing guarantees focus on the simple case of a one-dimensional (scalar) hyperparameter. This leaves the practically important, multi-dimensional hyperparameter tuning setting unresolved. We address this open question by establishing the first general framework for establishing generalization guarantees for tuning multi-dimensional hyperparameters in data-driven settings. Our approach strengthens the generalization guarantee framework for semi-algebraic function classes by exploiting tools from real algebraic geometry, yielding sharper, more broadly applicable guarantees. We then extend the analysis to hyperparameter tuning using the validation loss under minimal assumptions, and derive improved bounds when additional structure is available. Finally, we demonstrate the scope of the framework with new learnability results, including data-driven weighted group lasso and weighted fused lasso.

artificial intelligence, machine learning, optimization problem, (15 more...)

arXiv.org Machine Learning

2602.02406

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
Asia > China > Hong Kong (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Computational Learning Theory (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
(2 more...)

Add feedback

An Efficient Algorithm for Thresholding Monte Carlo Tree Search

Nameki, Shoma, Nakamura, Atsuyoshi, Komiyama, Junpei, Tabata, Koji

arXiv.org Machine LearningFeb-2-2026

We introduce the Thresholding Monte Carlo Tree Search problem, in which, given a tree $\mathcal{T}$ and a threshold $θ$, a player must answer whether the root node value of $\mathcal{T}$ is at least $θ$ or not. In the given tree, `MAX' or `MIN' is labeled on each internal node, and the value of a `MAX'-labeled (`MIN'-labeled) internal node is the maximum (minimum) of its child values. The value of a leaf node is the mean reward of an unknown distribution, from which the player can sample rewards. For this problem, we develop a $δ$-correct sequential sampling algorithm based on the Track-and-Stop strategy that has asymptotically optimal sample complexity. We show that a ratio-based modification of the D-Tracking arm-pulling strategy leads to a substantial improvement in empirical sample complexity, as well as reducing the per-round computational cost from linear to logarithmic in the number of arms.

algorithm, artificial intelligence, node, (13 more...)

arXiv.org Machine Learning

2601.226

Country:

North America > United States > New York > New York County > New York City (0.14)
Europe > Austria > Vienna (0.14)
Asia > Japan > Hokkaidō (0.04)
(16 more...)

Genre: Research Report (0.50)

Industry: Leisure & Entertainment > Sports > Motorsports (0.34)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Minimax Rates for Hyperbolic Hierarchical Learning

Rawal, Divit, Vishwanath, Sriram

arXiv.org Machine LearningJan-29-2026

We prove an exponential separation in sample complexity between Euclidean and hyperbolic representations for learning on hierarchical data under standard Lipschitz regularization. For depth-$R$ hierarchies with branching factor $m$, we first establish a geometric obstruction for Euclidean space: any bounded-radius embedding forces volumetric collapse, mapping exponentially many tree-distant points to nearby locations. This necessitates Lipschitz constants scaling as $\exp(Ω(R))$ to realize even simple hierarchical targets, yielding exponential sample complexity under capacity control. We then show this obstruction vanishes in hyperbolic space: constant-distortion hyperbolic embeddings admit $O(1)$-Lipschitz realizability, enabling learning with $n = O(mR \log m)$ samples. A matching $Ω(mR \log m)$ lower bound via Fano's inequality establishes that hyperbolic representations achieve the information-theoretic optimum. We also show a geometry-independent bottleneck: any rank-$k$ prediction space captures only $O(k)$ canonical hierarchical contrasts.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Machine Learning

2601.20047

Country: North America (0.28)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.50)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.41)

Add feedback

Personalizing black-box models for nonparametric regression with minimax optimality

Li, Sai, Zhang, Linjun

arXiv.org Machine LearningJan-6-2026

Recent advances in large-scale models, including deep neural networks and large language models, have substantially improved performance across a wide range of learning tasks. The widespread availability of such pre-trained models creates new opportunities for data-efficient statistical learning, provided they can be effectively integrated into downstream tasks. Motivated by this setting, we study few-shot personalization, where a pre-trained black-box model is adapted to a target domain using a limited number of samples. We develop a theoretical framework for few-shot personalization in nonparametric regression and propose algorithms that can incorporate a black-box pre-trained model into the regression procedure. We establish the minimax optimal rate for the personalization problem and show that the proposed method attains this rate. Our results clarify the statistical benefits of leveraging pre-trained models under sample scarcity and provide robustness guarantees when the pre-trained model is not informative. We illustrate the finite-sample performance of the methods through simulations and an application to the California housing dataset with several pre-trained models.

large language model, machine learning, pre-trained model, (21 more...)

arXiv.org Machine Learning

2601.01432

Country: North America > United States > California (0.25)

Genre: Research Report > New Finding (0.66)

Industry: Transportation > Air (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.71)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A first-order method for nonconvex-strongly-concave constrained minimax optimization

Lu, Zhaosong, Mei, Sanyou

arXiv.org Machine LearningJan-6-2026

A first-order method for nonconvex-strongly-concave constrained minimax optimization Zhaosong Lu Sanyou Mei May 12, 2024 (Revised: October 23, 2025) Abstract In this paper we study a nonconvex-strongly-concave constrained minimax problem. Specifically, we propose a first-order augmented Lagrangian method for solving it, whose subproblems are nonconvex-strongly-concave unconstrained minimax problems and suitably solved by a first-order method developed in this paper that leverages the strong concavity structure. Under suitable assumptions, the proposed method achieves an operation complexity of O(ε 3.5 log ε 1), measured in terms of its fundamental operations, for finding an ε-KKT solution of the constrained minimax problem, which improves the previous best-known operation complexity by a factor of ε 0.5 . Keywords: minimax optimization, augmented Lagrangian method, first-order method, operation complexity Mathematics Subject Classification: 90C26, 90C30, 90C47, 90C99, 65K05 1 Introduction In this paper, we consider a nonconvex-strongly-concave constrained minimax problem F = min c(x) 0 max d(x,y) 0 {F (x,y):= f (x, y) + p(x) q(y)}. Assume that problem (1) has at least one optimal solution and the following additional assumptions hold.

algorithm 1, algorithm 2, artificial intelligence, (14 more...)

arXiv.org Machine Learning

2512.22909

Country: North America > United States (0.67)

Genre: Research Report (0.64)

Industry: Government (0.67)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)

Add feedback

Reddit overtakes TikTok in UK thanks to search algorithms and gen Z

The GuardianJan-3-2026, 07:00:33 GMT

Reddit is being touted as an antidote to AI-generated content. Reddit is being touted as an antidote to AI-generated content. Platform is now Britain's fourth most visited social media site as users seek out human-generated content Reddit, the online discussion platform, has overtaken TikTok as Britain's fourth most visited social media service, as search algorithms and gen Z have dramatically transformed its prominence. The platform has undergone huge growth over the last two years, with an 88% increase in the proportion of UK internet users it reaches. Three in five Brits online now encounter the site, up from a third in 2023, according to Ofcom .

reddit, search algorithm, subreddit, (7 more...)

The Guardian

Country:

Europe > United Kingdom (1.00)
North America > United States (0.32)
Europe > Ukraine (0.07)
Oceania > Australia (0.05)

Industry:

Media > News (1.00)
Government > Regional Government > Europe Government > United Kingdom Government (0.51)
Leisure & Entertainment > Sports > Soccer (0.32)
Government > Regional Government > North America Government > United States Government (0.32)

Technology:

Information Technology > Communications > Social Media (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.63)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.49)

Add feedback

Completed Hyperparameter Transfer across Modules, Width, Depth, Batch and Duration

Mlodozeniec, Bruno, Ablin, Pierre, Béthune, Louis, Busbridge, Dan, Klein, Michal, Ramapuram, Jason, Cuturi, Marco

arXiv.org Machine LearningDec-30-2025

Hyperparameter tuning can dramatically impact training stability and final performance of large-scale models. Recent works on neural network parameterisations, such as $μ$P, have enabled transfer of optimal global hyperparameters across model sizes. These works propose an empirical practice of search for optimal global base hyperparameters at a small model size, and transfer to a large size. We extend these works in two key ways. To handle scaling along most important scaling axes, we propose the Complete$^{(d)}$ Parameterisation that unifies scaling in width and depth -- using an adaptation of CompleteP -- as well as in batch-size and training duration. Secondly, with our parameterisation, we investigate per-module hyperparameter optimisation and transfer. We characterise the empirical challenges of navigating the high-dimensional hyperparameter landscape, and propose practical guidelines for tackling this optimisation problem. We demonstrate that, with the right parameterisation, hyperparameter transfer holds even in the per-module hyperparameter regime. Our study covers an extensive range of optimisation hyperparameters of modern models: learning rates, AdamW parameters, weight decay, initialisation scales, and residual block multipliers. Our experiments demonstrate significant training speed improvements in Large Language Models with the transferred per-module hyperparameters.

large language model, machine learning, natural language, (20 more...)

arXiv.org Machine Learning

2512.22382

Country: Europe (0.28)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.67)

Add feedback