Goto

Collaborating Authors

 Asia


Without-Replacement Sampling for Stochastic Gradient Methods Ohad Shamir Department of Computer Science and Applied Mathematics Weizmann Institute of Science Rehovot, Israel ohad.shamir@weizmann.ac.il

Neural Information Processing Systems

Stochastic gradient methods for machine learning and optimization problems are usually analyzed assuming data points are sampled with replacement. In contrast, sampling without replacement is far less understood, yet in practice it is very common, often easier to implement, and usually performs better. In this paper, we provide competitive convergence guarantees for without-replacement sampling under several scenarios, focusing on the natural regime of few passes over the data. Moreover, we describe a useful application of these results in the context of distributed optimization with randomly-partitioned data, yielding a nearly-optimal algorithm for regularized least squares (in terms of both communication complexity and runtime complexity) under broad parameter regimes. Our proof techniques combine ideas from stochastic optimization, adversarial online learning and transductive learning theory, and can potentially be applied to other stochastic optimization and learning problems.


Young Chinese use AI to launch one-person firms over job anxiety

The Japan Times

One-person company SoloNest sounder Karen Dai preparing for a coffee chat at a conference room in Shanghai on April 12. | AFP-JIJI Shanghai - Young Chinese, many who fear age discrimination in their workplace after turning 35, are increasingly starting one-person companies that have artificial intelligence do most of the work. Smaller startups are already in vogue in Silicon Valley and elsewhere, with rapidly advancing AI tools seen as a welcome teammate even as they threaten layoffs at existing firms. More young people in China are subscribing to the model, as cities pledge millions of dollars in funding and rent subsidies for such ventures, in alignment with Beijing's political goal of technological self-reliance. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right.


Pentagon seeks 75 billion for drones in record budget ask

The Japan Times

A soldier carries a drone during a military parade in Washington on June 14, 2025. The Pentagon's largest-ever budget request earmarks $75 billion for drones and technologies to counter them, mainly for a massive increase for a little-known office working with U.S. commandos to test and evaluate various systems, according to defense officials. The drone-funding proposal includes $54.6 billion for the Defense Autonomous Working Group, or DAWG, from just $225.9 million this year. That would appear to be the largest single year-over-year boost of any defense program or office, meaning it's likely to draw particular congressional and public scrutiny in an already eye-catching $1.5 trillion request that's 42% larger than this year's budget. The big boost for the Pentagon's little-known drone unit comes as the U.S. and Israeli war against Iran illustrates how drones can help level the playing field against even the world's most well-funded armed forces.


Meta to capture U.S. employee mouse movements and keystrokes to train AI

The Japan Times

Meta to capture U.S. employee mouse movements and keystrokes to train AI NEW YORK - Meta is installing new tracking software on U.S.-based employees' computers to capture mouse movements, clicks and keystrokes for use in training its artificial intelligence models, part of a broad initiative to build AI agents that can perform work tasks autonomously, the company told staffers in internal memos. The tool, called Model Capability Initiative (MCI), will run on work-related apps and websites and will also take occasional snapshots of the content on employees' screens, according to one of the memos, posted by a staff AI research scientist on Tuesday in a channel for the company's model-building Meta SuperIntelligence Labs team. The purpose, according to the memo, was to improve the company's AI models in areas where they struggle to replicate how humans interact with computers, like choosing from dropdown menus and using keyboard shortcuts. In a time of both misinformation and too much information, quality journalism is more crucial than ever. By subscribing, you can help us get the story right.



A Non-generative Framework and Convex Relaxations for Unsupervised Learning

Neural Information Processing Systems

We give a novel formal theoretical framework for unsupervised learning with two distinctive characteristics. First, it does not assume any generative model and based on a worst-case performance metric. Second, it is comparative, namely performance is measured with respect to a given hypothesis class. This allows to avoid known computational hardness results and improper algorithms based on convex relaxations. We show how several families of unsupervised learning models, which were previously only analyzed under probabilistic assumptions and are otherwise provably intractable, can be efficiently learned in our framework by convex optimization.


Theoretical Comparisons of Positive-Unlabeled Learning against Positive-Negative Learning

Neural Information Processing Systems

In PU learning, a binary classifier is trained from positive (P) and unlabeled (U) data without negative (N) data. Although N data is missing, it sometimes outperforms PN learning (i.e., ordinary supervised learning). Hitherto, neither theoretical nor experimental analysis has been given to explain this phenomenon. In this paper, we theoretically compare PU (and NU) learning against PN learning based on the upper bounds on estimation errors. We find simple conditions when PU and NU learning are likely to outperform PN learning, and we prove that, in terms of the upper bounds, either PU or NU learning (depending on the class-prior probability and the sizes of P and N data) given infinite U data will improve on PN learning. Our theoretical findings well agree with the experimental results on artificial and benchmark data even when the experimental setup does not match the theoretical assumptions exactly.


Last-Iterate Guarantees for Learning in Co-coercive Games

arXiv.org Machine Learning

We establish finite-time last-iterate guarantees for vanilla stochastic gradient descent in co-coercive games under noisy feedback. This is a broad class of games that is more general than strongly monotone games, allows for multiple Nash equilibria, and includes examples such as quadratic games with negative semidefinite interaction matrices and potential games with smooth concave potentials. Prior work in this setting has relied on relative noise models, where the noise vanishes as iterates approach equilibrium, an assumption that is often unrealistic in practice. We work instead under a substantially more general noise model in which the second moment of the noise is allowed to scale affinely with the squared norm of the iterates, an assumption natural in learning with unbounded action spaces. Under this model, we prove a last-iterate bound of order $O(\log(t)/t^{1/3})$, the first such bound for co-coercive games under non-vanishing noise. We additionally establish almost sure convergence of the iterates to the set of Nash equilibria and derive time-average convergence guarantees.


S2MAM: Semi-supervised Meta Additive Model for Robust Estimation and Variable Selection

arXiv.org Machine Learning

Semi-supervised learning with manifold regularization is a classical framework for jointly learning from both labeled and unlabeled data, where the key requirement is that the support of the unknown marginal distribution has the geometric structure of a Riemannian manifold. Typically, the Laplace-Beltrami operator-based manifold regularization can be approximated empirically by the Laplacian regularization associated with the entire training data and its corresponding graph Laplacian matrix. However, the graph Laplacian matrix depends heavily on the prespecified similarity metric and may lead to inappropriate penalties when dealing with redundant or noisy input variables. To address the above issues, this paper proposes a new \textit{Semi-Supervised Meta Additive Model (S$^2$MAM) based on a bilevel optimization scheme that automatically identifies informative variables, updates the similarity matrix, and simultaneously achieves interpretable predictions. Theoretical guarantees are provided for S$^2$MAM, including the computing convergence and the statistical generalization bound. Experimental assessments across 4 synthetic and 12 real-world datasets, with varying levels and categories of corruption, validate the robustness and interpretability of the proposed approach.


Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

arXiv.org Machine Learning

Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it reduces to a tractable per-step form: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this baseline online with a learned critic co-trained alongside the world model; regressing a single scalar, the critic converges well before the world model saturates, redirecting exploration toward learnable transitions without oracle knowledge of the noise floor. The reward is higher for learnable transitions and collapses toward the baseline for stochastic ones, effectively separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error and visitation-count baselines in convergence speed and final world model accuracy.