Goto

Collaborating Authors

 Search


Reasoning Language Models: A Blueprint

arXiv.org Artificial Intelligence

Reasoning language models (RLMs), also known as Large Reasoning Models (LRMs), such as OpenAI's o1 and o3, DeepSeek-V3, and Alibaba's QwQ, have redefined AI's problem-solving capabilities by extending LLMs with advanced reasoning mechanisms. Yet, their high costs, proprietary nature, and complex architectures - uniquely combining Reinforcement Learning (RL), search heuristics, and LLMs - present accessibility and scalability challenges. To address these, we propose a comprehensive blueprint that organizes RLM components into a modular framework, based on a survey and analysis of all RLM works. This blueprint incorporates diverse reasoning structures (chains, trees, graphs, and nested forms), reasoning strategies (e.g., Monte Carlo Tree Search, Beam Search), RL concepts (policy, value models and others), supervision schemes (Outcome-Based and Process-Based Supervision), and other related concepts (e.g., Test-Time Compute, Retrieval-Augmented Generation, agent tools). We also provide detailed mathematical formulations and algorithmic specifications to simplify RLM implementation. By showing how schemes like LLaMA-Berry, QwQ, Journey Learning, and Graph of Thoughts fit as special cases, we demonstrate the blueprint's versatility and unifying potential. To illustrate its utility, we introduce x1, a modular implementation for rapid RLM prototyping and experimentation. Using x1 and a literature review, we provide key insights, such as multi-phase training for policy and value models, and the importance of familiar training distributions. Finally, we discuss scalable RLM cloud deployments and we outline how RLMs can integrate with a broader LLM ecosystem. Our work demystifies RLM construction, democratizes advanced reasoning capabilities, and fosters innovation, aiming to mitigate the gap between "rich AI" and "poor AI" by lowering barriers to RLM design and experimentation.


GreedyPixel: Fine-Grained Black-Box Adversarial Attack Via Greedy Algorithm

arXiv.org Artificial Intelligence

A critical requirement for deep learning models is ensuring their robustness against adversarial attacks. These attacks commonly introduce noticeable perturbations, compromising the visual fidelity of adversarial examples. Another key challenge is that while white-box algorithms can generate effective adversarial perturbations, they require access to the model gradients, limiting their practicality in many real-world scenarios. Existing attack mechanisms struggle to achieve similar efficacy without access to these gradients. In this paper, we introduce GreedyPixel, a novel pixel-wise greedy algorithm designed to generate high-quality adversarial examples using only query-based feedback from the target model. GreedyPixel improves computational efficiency in what is typically a brute-force process by perturbing individual pixels in sequence, guided by a pixel-wise priority map. This priority map is constructed by ranking gradients obtained from a surrogate model, providing a structured path for perturbation. Our results demonstrate that GreedyPixel achieves attack success rates comparable to white-box methods without the need for gradient information, and surpasses existing algorithms in black-box settings, offering higher success rates, reduced computational time, and imperceptible perturbations. These findings underscore the advantages of GreedyPixel in terms of attack efficacy, time efficiency, and visual quality.


Billion-scale Similarity Search Using a Hybrid Indexing Approach with Advanced Filtering

arXiv.org Artificial Intelligence

Similarity search, the task of finding similar vectors, has become a fundamental operation in machine learning, with applications in recommendation engines, semantic search systems, and more [1-3]. As datasets grow to billions of entries, the challenge of performing efficient searches on high-dimensional vectors becomes increasingly complex [4]. This is further compounded by the well-known curse of dimensionality [5], which affects the performance and accuracy of search algorithms as the number of dimensions increases. Approximate Nearest Neighbor (ANN) algorithms, such as Inverted File Index (IVF) [6] and Hierarchical Navigable Small World (HNSW) [7], have been developed to address scalability and performance issues. IVF segments the search space into smaller areas, called Voronoi cells [8], while HNSW constructs a navigable graph structure for efficient search space traversal. Despite their advancements, these methods often struggle to support complex, multi-dimensional filtering efficiently. This is crucial in practical scenarios where additional criteria beyond vector similarity are required to refine search results [6]. Examples of such scenarios include e-commerce product search and semantic search with filtering and recommendation systems.


Selecting Critical Scenarios of DER Adoption in Distribution Grids Using Bayesian Optimization

arXiv.org Machine Learning

We develop a new methodology to select scenarios of DER adoption most critical for distribution grids. Anticipating risks of future voltage and line flow violations due to additional PV adopters is central for utility investment planning but continues to rely on deterministic or ad hoc scenario selection. We propose a highly efficient search framework based on multi-objective Bayesian Optimization. We treat underlying grid stress metrics as computationally expensive black-box functions, approximated via Gaussian Process surrogates and design an acquisition function based on probability of scenarios being Pareto-critical across a collection of line- and bus-based violation objectives. Our approach provides a statistical guarantee and offers an order of magnitude speed-up relative to a conservative exhaustive search. Case studies on realistic feeders with 200-400 buses demonstrate the effectiveness and accuracy of our approach.


Dual-Branch HNSW Approach with Skip Bridges and LID-Driven Optimization

arXiv.org Artificial Intelligence

The Hierarchical Navigable Small World (HNSW) algorithm is widely used for approximate nearest neighbor (ANN) search, leveraging the principles of navigable small-world graphs. However, it faces some limitations. The first is the local optima problem, which arises from the algorithm's greedy search strategy, selecting neighbors based solely on proximity at each step. This often leads to cluster disconnections. The second limitation is that HNSW frequently fails to achieve logarithmic complexity, particularly in high-dimensional datasets, due to the exhaustive traversal through each layer. To address these limitations, we propose a novel algorithm that mitigates local optima and cluster disconnections while enhancing the construction speed, maintaining inference speed. The first component is a dual-branch HNSW structure with LID-based insertion mechanisms, enabling traversal from multiple directions. This improves outlier node capture, enhances cluster connectivity, accelerates construction speed and reduces the risk of local minima. The second component incorporates a bridge-building technique that bypasses redundant intermediate layers, maintaining inference and making up the additional computational overhead introduced by the dual-branch structure. Experiments on various benchmarks and datasets showed that our algorithm outperforms the original HNSW in both accuracy and speed. We evaluated six datasets across Computer Vision (CV), and Natural Language Processing (NLP), showing recall improvements of 18\% in NLP, and up to 30\% in CV tasks while reducing the construction time by up to 20\% and maintaining the inference speed. We did not observe any trade-offs in our algorithm. Ablation studies revealed that LID-based insertion had the greatest impact on performance, followed by the dual-branch structure and bridge-building components.


Safety in safe Bayesian optimization and its ramifications for control

arXiv.org Machine Learning

A recurring and important task in control engineering is parameter tuning under constraints, which conceptually amounts to optimization of a blackbox function accessible only through noisy evaluations. For example, in control practice parameters of a pre-designed controller are often tuned online in feedback with a plant, and only safe parameter values should be tried, avoiding for example instability. Recently, machine learning methods have been deployed for this important problem, in particular, Bayesian optimization (BO). To handle safety constraints, algorithms from safe BO have been utilized, especially SafeOpt-type algorithms, which enjoy considerable popularity in learning-based control, robotics, and adjacent fields. However, we identify two significant obstacles to practical safety. First, SafeOpt-type algorithms rely on quantitative uncertainty bounds, and most implementations replace these by theoretically unsupported heuristics. Second, the theoretically valid uncertainty bounds crucially depend on a quantity - the reproducing kernel Hilbert space norm of the target function - that at present is impossible to reliably bound using established prior engineering knowledge. By careful numerical experiments we show that these issues can indeed cause safety violations. To overcome these problems, we propose Lipschitz-only Safe Bayesian Optimization (LoSBO), a safe BO algorithm that relies only on a known Lipschitz bound for its safety. Furthermore, we propose a variant (LoS-GP-UCB) that avoids gridding of the search space and is therefore applicable even for moderately high-dimensional problems.


Review for NeurIPS paper: Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

Neural Information Processing Systems

Additional Feedback: Can you explain more what is the reason for the disappearance of the phase transition compared to the discrete game? In the discrete game without a switching bound, you would typically get a total cost of (1 epsilon)*OPT (log n)/epsilon, and then set epsilon sqrt((log n)/T) to balance the losses. This means that even without a switching bound, against an oblivious adversary, you're switching between actions only about 1/sqrt(T) of the time anyway. On the other hand, you are constantly modifying your probability distribution every round (just that this change is only occasionally reflected in a switch of actions). In your game, there isn't this distinction between the hidden probability distribution and the action played. Is that partly the difference?


Review for NeurIPS paper: Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

Neural Information Processing Systems

The initial four reviews recommended accepting, based on the strength of the results and the interesting ideas. However, some issues were pointed out mainly related to the novelty value of the work and relation to several earlier results in the field. The authors in their reply provided an informative discussion on these issues. This satisfied the reviewers and strengthened their confidence for accepting the paper.


Review for NeurIPS paper: Community detection using fast low-cardinality semidefinite programming

Neural Information Processing Systems

The authors introduce a new local search algorithm for community detection in social networks. The main strength of the paper is the strong empirical performances of the introduced algorithm. Its main limitation is the lack of theoretical guarantees for the new model.


Reviews: Combinatorial Bayesian Optimization using the Graph Cartesian Product

Neural Information Processing Systems

This manuscript proposes a system for combinatorial Bayesian optimization called COMBO, aimed at problems with large numbers of categorical and/or ordinal features. The main contribution is an effective kernel for this setting based on applying a graph kernel to the graph Cartesian product of each of the features, which can be computed efficiently by exploiting structure. This kernel can be further enhanced using an ARD extension and a horseshoe prior to encourage sparse feature selection. The COMBO system then creates a GP with this kernel and does random local search to maximize an acquisition function such as EI in the combinatorial space. A series of experiments demonstrate COMBO performing better on real and synthetic tasks than alternatives such as systems using one-hot encodings.