Goto

Collaborating Authors

 mapping


The General Theory of Localization Methods

arXiv.org Machine Learning

This paper proposes a general machine learning framework called the localization method, which is fundamentally built on two core concepts: localization kernels and local means -- key components that underpin the self-attention mechanism. To establish a rigorous theoretical foundation, the framework is formally defined through two essential pillars: the formulation of the local(-ized) model and the localization trick. We systematically investigate the connections between the localization method and a wide range of existing machine learning models/methods, including (but not limited to) kernel methods, lazy learning, the MeanShift algorithm, relaxation labeling, Hopfield networks, local linear embedding (LLE), fuzzy inference, and denoising autoencoders (DAEs). By dissecting these relationships, we clarify the broader theoretical significance of the localization method and demonstrate its practical applicability across diverse machine learning tasks. Furthermore, we explore advanced extensions of the framework, such as adaptive kernels, hierarchical local models, and non-local models. Notably, we show that the Transformer -- a cornerstone of modern sequence modeling -- can be constructed using hierarchical local models, revealing the ability of the localization method to unify and generalize state-of-the-art architectures. This work not only provides a unified theoretical lens to reinterpret existing models but also offers new methodological tools for designing flexible, data-adaptive learning systems.


Structured Analytic Coherent Point Drift for Non-Rigid Point Set Registration

arXiv.org Machine Learning

Coherent Point Drift (CPD) is a representative probabilistic framework for unsupervised non-rigid point set registration. Its standard non-rigid M-step, however, relies on a point-indexed Gaussian-kernel system whose size grows with the number of moving points, making deformation estimation computationally heavy for large point sets and difficult to control in complexity during registration. To address these limitations, we propose Analytic-CPD, a new unsupervised non-rigid registration framework that gives CPD a structured analytic reformulation. Analytic-CPD preserves the CPD posterior correspondence layer, but lifts the M-step from point-indexed kernel displacement estimation to structured analytic mapping estimation. By coupling the Gaussian-mixture posterior mechanism of CPD with Structured Analytic Mappings (SAM), the method obtains a deformation model whose coefficient dimension is governed by the ambient dimension and analytic order rather than by the number of moving points. More importantly, deformation estimation is organized over an interpretable hierarchy of analytic function spaces, so the analytic order can be increased progressively as posterior correspondences become more reliable. We implement this idea through an increasing-degree continuation strategy with decreasing stage lengths: low-order analytic maps first stabilize the posterior correspondence structure, while higher-order modes later refine nonlinear residual deformation. Experiments on controlled model-matched, smooth model-mismatch, and registered human-shape data demonstrate the effectiveness and favorable accuracy--efficiency performance of Analytic-CPD.


Why Does Agentic Safety Fail to Generalize Across Tasks?

arXiv.org Machine Learning

AI agents are increasingly deployed in multi-task settings, where the task to perform is specified at test time, and the agent must generalize to unseen tasks. A major concern in such settings is safety: often, an agent must not only execute unseen tasks, but do so while avoiding risks and handling ones that materialize. Empirical evidence suggests that even when the ability to execute generalizes to unseen tasks, the ability to do so safely frequently does not. This paper provides theory and experiments indicating that failures of agentic safety to generalize across tasks are not merely due to limitations of training methods, but reflect an inherent property of safety itself: the relationship between a task and its safe execution is more complex than the relationship between a task and its execution alone. Theoretically, we analyze linear-quadratic control with $H_{\infty}$-robustness, and prove that the mapping from task specification to an optimal controller has higher Lipschitz constant with safety requirements than without, yielding a Lipschitz bound of independent interest. Empirically, we demonstrate our conclusions in simulated quadcopter navigation with a neural network agent and in CRM with an LLM agent. Our findings suggest that current efforts to enhance agentic safety may be insufficient, and point to a need for fundamentally different approaches.


Why Model Selection Fails in Time Series Forecasting: An Empirical Study of Instability Across Data Regimes

arXiv.org Machine Learning

Time series forecasting models often exhibit inconsistent performance across datasets with varying statistical and structural properties. Despite the wide range of available forecasting techniques, it remains unclear whether model selection can be reliably guided by simple data characteristics. This paper investigates why rule-based model selection fails in time series forecasting by analyzing the relationship between data-regime descriptors and model performance. A descriptor-based framework is introduced to characterize time series using measurable properties, including trend strength, seasonality, noise level, and temporal dependence. Based on these descriptors, a rule-based selection mechanism is formulated to map data regimes to candidate forecasting models. The approach is evaluated on multiple real-world datasets across different domains and forecasting horizons. The results show that rule-based model selection achieves low accuracy, with correct model identification occurring in only a small fraction of cases. Significant discrepancies are observed between recommended and empirically optimal models, particularly in noisy and mixed regimes. Further analysis reveals that model performance is highly sensitive to both dataset characteristics and forecasting horizon, resulting in substantial ranking instability across scenarios. These findings explain why simple heuristic rules fail to generalize and demonstrate that forecasting performance cannot be reliably predicted using static, descriptor-based approaches. This study provides empirical evidence that model selection in time series forecasting is inherently context-dependent and highlights the need for more adaptive, data-driven strategies.


Generalized Distributional Alignment Games for Unbiased Answer-Level Fine-Tuning

arXiv.org Machine Learning

The Distributional Alignment Game framework provides a powerful variational perspective on Answer-Level Fine-Tuning (ALFT). However, standard algorithms for these games rely on estimating logarithmic rewards from small batches, introducing a systematic bias due to Jensen's inequality that can destabilize training. In this paper, we systematically resolve this structural estimation bias. First, we generalize the alignment game to arbitrary Bregman divergences, showing that for a family of geometries inducing polynomial rewards, we can construct provably exact and unbiased estimators using U-statistics. Second, for the canonical KL divergence game where an exact solution is impossible, we derive a globally robust minimax polynomial estimator that is provably optimal, achieving the fundamental statistical error limit of $ฮ˜(1/K^2)$, which we establish via the Ditzian-Totik theorem. Finally, we synthesize these two approaches to propose a novel Variance-Optimal Augmented Polynomial Optimization Program (AQP) Estimator, proving that by systematically reducing variance, our method achieves not only optimal bias but also provably accelerated game convergence, leading to more efficient and stable training with zero online computational overhead.





Convolutional Monge Mapping Normalization for learning on sleep data

Neural Information Processing Systems

In many machine learning applications on signals and biomedical data, especially electroencephalogram (EEG), one major challenge is the variability of the data across subjects, sessions, and hardware devices. In this work, we propose a new method called Convolutional Monge Mapping Normalization (CMMN), which consists in filtering the signals in order to adapt their power spectrum density (PSD) to a Wasserstein barycenter estimated on training data. CMMN relies on novel closed-form solutions for optimal transport mappings and barycenters and provides individual test time adaptation to new data without needing to retrain a prediction model. Numerical experiments on sleep EEG data show that CMMN leads to significant and consistent performance gains independent from the neural network architecture when adapting between subjects, sessions, and even datasets collected with different hardware. Notably our performance gain is on par with much more numerically intensive Domain Adaptation (DA) methods and can be used in conjunction with those for even better performances.


L2ight: Enabling On-Chip Learning for Optical Neural Networks via Efficient in-situ Subspace Optimization

Neural Information Processing Systems

Silicon-photonics-based optical neural network (ONN) is a promising hardware platform that could represent a paradigm shift in efficient AI with its CMOScompatibility, flexibility, ultra-low execution latency, and high energy efficiency. In-situ training on the online programmable photonic chips is appealing but still encounters challenging issues in on-chip implementability, scalability, and efficiency. In this work, we propose a closed-loop ONN on-chip learning framework L2ight to enable scalable ONN mapping and efficient in-situ learning. L2ightadopts a three-stage learning flow that first calibrates the complicated photonic circuit states under challenging physical constraints, then performs photonic core mapping via combined analytical solving and zeroth-order optimization. A subspace learning procedure with multi-level sparsity is integrated into L2ightto enable in-situ gradient evaluation and fast adaptation, unleashing the power of optics for real on-chip intelligence. Extensive experiments demonstrate our proposed L2ightoutperforms prior ONN training protocols with 3-order-of-magnitude higher scalability and over 30 better efficiency, when benchmarked on various models and learning tasks. This synergistic framework is the first scalable on-chip learning solution that pushes this emerging field from intractable to scalable and further to efficient for next-generation self-learnable photonic neural chips. From a co-design perspective, L2ightalso provides essential insights for hardware-restricted unitary subspace optimization and efficient sparse training.