Goto

Collaborating Authors

 Asia


CLion: Efficient Cautious Lion Optimizer with Enhanced Generalization

arXiv.org Machine Learning

Lion optimizer is a popular learning-based optimization algorithm in machine learning, which shows impressive performance in training many deep learning models. Although convergence property of the Lion optimizer has been studied, its generalization analysis is still missing. To fill this gap, we study generalization property of the Lion via algorithmic stability based on the mathematical induction. Specifically, we prove that the Lion has a generalization error of $O(\frac{1}{Nτ^T})$, where $N$ is training sample size, and $τ>0$ denotes the smallest absolute value of non-zero element in gradient estimator, and $T$ is the total iteration number. In addition, we obtain an interesting byproduct that the SignSGD algorithm has the same generalization error as the Lion. To enhance generalization of the Lion, we design a novel efficient Cautious Lion (i.e., CLion) optimizer by cautiously using sign function. Moreover, we prove that our CLion has a lower generalization error of $O(\frac{1}{N})$ than $O(\frac{1}{Nτ^T})$ of the Lion, since the parameter $τ$ generally is very small. Meanwhile, we study convergence property of our CLion optimizer, and prove that our CLion has a fast convergence rate of $O(\frac{\sqrt{d}}{T^{1/4}})$ under $\ell_1$-norm of gradient for nonconvex stochastic optimization, where $d$ denotes the model dimension. Extensive numerical experiments demonstrate effectiveness of our CLion optimizer.


Path-Sampled Integrated Gradients

arXiv.org Machine Learning

We introduce path-sampled integrated gradients (PS-IG), a framework that generalizes feature attribution by computing the expected value over baselines sampled along the linear interpolation path. We prove that PS-IG is mathematically equivalent to path-weighted integrated gradients, provided the weighting function matches the cumulative distribution function of the sampling density. This equivalence allows the stochastic expectation to be evaluated via a deterministic Riemann sum, improving the error convergence rate from $O(m^{-1/2})$ to $O(m^{-1})$ for smooth models. Furthermore, we demonstrate analytically that PS-IG functions as a variance-reducing filter against gradient noise - strictly lowering attribution variance by a factor of 1/3 under uniform sampling - while preserving key axiomatic properties such as linearity and implementation invariance.


The Allbirds Pivot Is a Terrible Idea … Right?

The Atlantic - Technology

The Allbirds Pivot Is a Terrible Idea Right? Its turn to AI could be an escape hatch for a company with nothing to lose. This is an edition of The Daily, a newsletter that guides you through the biggest stories of the day, helps you discover new ideas, and recommends the best in culture. Walk into any Silicon Valley office in the late 2010s, and you'd probably see at least one pair of Allbirds. Woolly and eco-friendly, the sneakers once epitomized a certain kind of corporate culture (even Barack Obama was a fan), and the company behind them was valued at roughly $4 billion at its peak, in 2021.


Litter of 5 bear cubs spotted in Connecticut for the first time

Popular Science

About 1,000 to 1,200 black bears call the Nutmeg State home. More information Adding us as a Preferred Source in Google by using this link indicates that you would like to see more of our content in Google News results. Breakthroughs, discoveries, and DIY tips sent six days a week. The state of Connecticut is probably not the first place that comes to mind when you think of bears . However, the Nutmeg State is home to about 1,000 to 1,200 black bears () bears.


Your dreams decoded: Scientists reveal what your nighttime visions say about you - and why night terrors might actually be GOOD for you

Daily Mail - Science & tech

Vance grounded at White House as Iran peace talks in turmoil and Trump declares: 'I expect to be bombing' New'Hollywood dose' pill: A-listers hooked on'youth elixir' that dermatologists say is anti-ageing, shrinks pores, smooths wrinkles... and even banishes rosacea Days after we got engaged, the love of my life told me he'd killed a man and buried him in a bog. I reported him to police... but then I made this irreversible mistake Ark of the Covenant's final resting place pinpointed by archaeologists as fresh search begins Ritzy Bay Area town torn apart after teacher's daughter, 16, crashed car while speeding and killed four friends... then posted a TikTok video that poured fuel on the flames Jordon Hudson extends her control over Bill Belichick's empire with secret move that is set to leave his family and friends furious Two CIA officers killed in Mexico when their car skidded off ravine and exploded after meeting about bust of'largest ever drug lab' Life-threatening cantaloupe recall in four states upgraded to FDA's highest risk level... 'reasonable probability of death' AMANDA PLATELL: Why Sarah Ferguson - with the ghost of Princess Diana at her side - is ready to sensationally blow up the Royal Family. She knows ALL their secrets... Trump confronts Xi as US forces seize Chinese ship carrying mysterious'gift' to Iran Team USA Olympics star Noah Lyles slammed for'horrible' reaction to his wife's wedding dress reveal Humiliating moment runner celebrates winning marathon... only to be pipped at the line by rival in brutal finish Patriots coach Mike Vrabel reveals'difficult conversations' with his wife as he speaks out for the first time since Dianna Russini photo scandal How to lose weight when perimenopause sabotages your metabolism: I'm a trainer but when I hit 46, I piled on the pounds overnight. The new'posh' drug that's easier to order than Uber Eats - and why all my middle-class friends have ditched booze and cocaine for it: JANA HOCKING Grieving mother says she went to LA school every day to complain daughter was being bullied... then tragedy struck when the lead tormentor, 12, hurled metal water bottle at victim's head Autistic woman, 24, worked hard to build independent life for herself... now she's PARALYZED thanks to selfishness of stranger READ MORE: The five things you'll never see in a dream - including your phone It's never nice waking up and remembering a scary dream - but having night terrors might actually be a good thing, experts say. Researchers have found that feeling fear during your nighttime visions could indicate you're better at handling your emotions.


Cost-optimal Sequential Testing via Doubly Robust Q-learning

arXiv.org Machine Learning

Clinical decision-making often involves selecting tests that are costly, invasive, or time-consuming, motivating individualized, sequential strategies for what to measure and when to stop ascertaining. We study the problem of learning cost-optimal sequential decision policies from retrospective data, where test availability depends on prior results, inducing informative missingness. Under a sequential missing-at-random mechanism, we develop a doubly robust Q-learning framework for estimating optimal policies. The method introduces path-specific inverse probability weights that account for heterogeneous test trajectories and satisfy a normalization property conditional on the observed history. By combining these weights with auxiliary contrast models, we construct orthogonal pseudo-outcomes that enable unbiased policy learning when either the acquisition model or the contrast model is correctly specified. We establish oracle inequalities for the stage-wise contrast estimators, along with convergence rates, regret bounds, and misclassification rates for the learned policy. Simulations demonstrate improved cost-adjusted performance over weighted and complete-case baselines, and an application to a prostate cancer cohort study illustrates how the method reduces testing cost without compromising predictive accuracy.


Joint Representation Learning and Clustering via Gradient-Based Manifold Optimization

arXiv.org Machine Learning

Clustering and dimensionality reduction have been crucial topics in machine learning and computer vision. Clustering high-dimensional data has been challenging for a long time due to the curse of dimensionality. For that reason, a more promising direction is the joint learning of dimension reduction and clustering. In this work, we propose a Manifold Learning Framework that learns dimensionality reduction and clustering simultaneously. The proposed framework is able to jointly learn the parameters of a dimension reduction technique (e.g. linear projection or a neural network) and cluster the data based on the resulting features (e.g. under a Gaussian Mixture Model framework). The framework searches for the dimension reduction parameters and the optimal clusters by traversing a manifold,using Gradient Manifold Optimization. The obtained The proposed framework is exemplified with a Gaussian Mixture Model as one simple but efficient example, in a process that is somehow similar to unsupervised Linear Discriminant Analysis (LDA). We apply the proposed method to the unsupervised training of simulated data as well as a benchmark image dataset (i.e. MNIST). The experimental results indicate that our algorithm has better performance than popular clustering algorithms from the literature.


Universality of Gaussian-Mixture Reverse Kernels in Conditional Diffusion

arXiv.org Machine Learning

We prove that conditional diffusion models whose reverse kernels are finite Gaussian mixtures with ReLU-network logits can approximate suitably regular target distributions arbitrarily well in context-averaged conditional KL divergence, up to an irreducible terminal mismatch that typically vanishes with increasing diffusion horizon. A path-space decomposition reduces the output error to this mismatch plus per-step reverse-kernel errors; assuming each reverse kernel factors through a finite-dimensional feature map, each step becomes a static conditional density approximation problem, solved by composing Norets' Gaussian-mixture theory with quantitative ReLU bounds. Under exact terminal matching the resulting neural reverse-kernel class is dense in conditional KL.


Forecasting Multivariate Time Series under Predictive Heterogeneity: A Validation-Driven Clustering Framework

arXiv.org Machine Learning

We study adaptive pooling under predictive heterogeneity in high-dimensional multivariate time series forecasting, where global models improve statistical efficiency but may fail to capture heterogeneous predictive structure, while naive specialization can induce negative transfer. We formulate adaptive pooling as a statistical decision problem and propose a validation-driven framework that determines when and how specialization should be applied. Rather than grouping series based on representation similarity, we define partitions through out-of-sample predictive performance, thereby aligning data organization with predictive risk, defined as expected out-of-sample loss and approximated via validation error. Cluster assignments are iteratively updated using validation losses for both point (Huber) and probabilistic (pinball) forecasting, improving robustness to heavy-tailed errors and local anomalies. To ensure reliability, we introduce a leakage-free fallback mechanism that reverts to a global model whenever specialization fails to improve validation performance, providing a safeguard against performance degradation under a strict training-validation-test protocol. Experiments on large-scale traffic datasets demonstrate consistent improvements over strong baselines while avoiding degradation when heterogeneity is weak. Overall, the proposed framework provides a principled and practically reliable approach to adaptive pooling in high-dimensional forecasting problems.


Multistage Conditional Compositional Optimization

arXiv.org Machine Learning

We introduce Multistage Conditional Compositional Optimization (MCCO) as a new paradigm for decision-making under uncertainty that combines aspects of multistage stochastic programming and conditional stochastic optimization. MCCO minimizes a nest of conditional expectations and nonlinear cost functions. It has numerous applications and arises, for example, in optimal stopping, linear-quadratic regulator problems, distributionally robust contextual bandits, as well as in problems involving dynamic risk measures. The naïve nested sampling approach for MCCO suffers from the curse of dimensionality familiar from scenario tree-based multistage stochastic programming, that is, its scenario complexity grows exponentially with the number of nests. We develop new multilevel Monte Carlo techniques for MCCO whose scenario complexity grows only polynomially with the desired accuracy.