Goto

Collaborating Authors

 restart



Windows won't boot? Safe Mode is the lifeline you need

PCWorld

PCWorld explains how Safe Mode serves as a critical troubleshooting tool when Windows fails to boot by loading only essential system components. Safe Mode enables users to identify problematic drivers, uninstall recent programs, run system repairs like SFC and DISM, and access System Restore. Key diagnostic tools include boot logging to identify crash-causing drivers, Device Manager for driver rollbacks, and startup management through Task Manager. If your Windows PC won't start properly or keeps crashing, Safe Mode can help you identify the cause and fix the problem. In Safe Mode, Windows only loads the most essential drivers and services, skips third-party autostart programs, and uses a simple graphical user interface. This allows you to disable faulty drivers, software, or malware-since these do not run in Safe Mode.


More Bang for the Buck: Improving the Inference of Large Language Models at a Fixed Budget using Reset and Discard (ReD)

Meir, Sagi, Keidar, Tommer D., Levi, Noam, Reuveni, Shlomi, Hirshberg, Barak

arXiv.org Machine Learning

The performance of large language models (LLMs) on verifiable tasks is usually measured by pass@k, the probability of answering a question correctly at least once in k trials. At a fixed budget, a more suitable metric is coverage@cost, the average number of unique questions answered as a function of the total number of attempts. We connect the two metrics and show that the empirically-observed power-law behavior in pass@k leads to a sublinear growth of the coverage@cost (diminishing returns). To solve this problem, we propose Reset-and-Discard (ReD), a query method of LLMs that increases coverage@cost for any given budget, regardless of the pass@k form. Moreover, given a pass@k, we can quantitatively predict the savings in the total number of attempts using ReD. If pass@k is not available for the model, ReD can infer its power-law exponent. Experiments on three LLMs using HumanEval demonstrate that ReD substantially reduces the required attempts, tokens, and USD cost to reach a desired coverage, while also offering an efficient way to measure inference power-laws.


An Efficient Variant of One-Class SVM with Lifelong Online Learning Guarantees

Suk, Joe, Kpotufe, Samory

arXiv.org Machine Learning

We study outlier (a.k.a., anomaly) detection for single-pass non-stationary streaming data. In the well-studied offline or batch outlier detection problem, traditional methods such as kernel One-Class SVM (OCSVM) are both computationally heavy and prone to large false-negative (Type II) errors under non-stationarity. To remedy this, we introduce SONAR, an efficient SGD-based OCSVM solver with strongly convex regularization. We show novel theoretical guarantees on the Type I/II errors of SONAR, superior to those known for OCSVM, and further prove that SONAR ensures favorable lifelong learning guarantees under benign distribution shifts. In the more challenging problem of adversarial non-stationary data, we show that SONAR can be used within an ensemble method and equipped with changepoint detection to achieve adaptive guarantees, ensuring small Type I/II errors on each phase of data. We validate our theoretical findings on synthetic and real-world datasets.


Batch Acquisition Function Evaluations and Decouple Optimizer Updates for Faster Bayesian Optimization

Irie, Kaichi, Watanabe, Shuhei, Onishi, Masaki

arXiv.org Artificial Intelligence

Bayesian optimization (BO) efficiently finds high-performing parameters by maximizing an acquisition function, which models the promise of parameters. A major computational bottleneck arises in acquisition function optimization, where multi-start optimization (MSO) with quasi-Newton (QN) methods is required due to the non-convexity of the acquisition function. BoTorch, a widely used BO library, currently optimizes the summed acquisition function over multiple points, leading to the speedup of MSO owing to Py-Torch batching. Nevertheless, this paper empirically demonstrates the suboptimality of this approach in terms of off-diagonal approximation errors in the inverse Hessian of a QN method, slowing down its convergence. To address this problem, we propose to decouple QN updates using a coroutine while batching the acquisition function calls. Our approach not only yields the theoretically identical convergence to the sequential MSO but also drastically reduces the wall-clock time compared to the previous approaches. Our approach is available in GPSampler in Optuna, effectively reducing its computational overhead.


Solving Diffusion Inverse Problems with Restart Posterior Sampling

Ahmed, Bilal, Makin, Joseph G.

arXiv.org Machine Learning

Inverse problems are fundamental to science and engineering, where the goal is to infer an underlying signal or state from incomplete or noisy measurements. Recent approaches employ diffusion models as powerful implicit priors for such problems, owing to their ability to capture complex data distributions. However, existing diffusion-based methods for inverse problems often rely on strong approximations of the posterior distribution, require computationally expensive gradient backpropagation through the score network, or are restricted to linear measurement models. In this work, we propose Restart for Posterior Sampling (RePS), a general and efficient framework for solving both linear and non-linear inverse problems using pre-trained diffusion models. RePS builds on the idea of restart-based sampling, previously shown to improve sample quality in unconditional diffusion, and extends it to posterior inference. Our method employs a conditioned ODE applicable to any differentiable measurement model and introduces a simplified restart strategy that contracts accumulated approximation errors during sampling. Unlike some of the prior approaches, RePS avoids backpropagation through the score network, substantially reducing computational cost. W e demonstrate that RePS achieves faster convergence and superior reconstruction quality compared to existing diffusion-based baselines across a range of inverse problems, including both linear and non-linear settings.




A Proofs

Neural Information Processing Systems

A.1 Proof of Theorem 3.1 First we set up some notation. All algorithms we are considering, if not discrete, induce a density w.r.t. the Lebesgue measure. The only difference between Theorem 3.1 and this theorem is that a privacy filter halts at a random The same argument can be used to bound the other direction of the divergence. Since we run batch gradient descent and not SGD as in the library example, we tune all hyperparameters from scratch. We think of the minimum of an empty set as .


Heavy Ball Momentum for Conditional Gradient

Neural Information Processing Systems

Unlike projection-based methods, momentum cannot improve the convergence rate of FW, in general. This limitation motivates the present work, which deals with heavy ball momentum, and its impact to FW .