Goto

Collaborating Authors

 lai


Taiwan president cancels trip after African countries close airspace

BBC News

Taiwan President Lai Ching-te has cancelled a presidential trip to the African nation of Eswatini, accusing Beijing of putting pressure on its neighbours to bar his aircraft from flying over their territories. Seychelles, Mauritius and Madagascar revoked Lai's overflight permits after intense pressure and economic coercion from China, said a Taiwan official. China denied coercion, while praising the three African countries saying it had high appreciation for them. This is the first publicly known instance where a Taiwanese leader has had to cancel a foreign trip due to revoked flight permits. Eswatini, formerly known as Swaziland, is Taiwan's only diplomatic ally in Africa.


Physics informed Transformer-VAE for biophysical parameter estimation: PROSAIL model inversion in Sentinel-2 imagery

Mensah, Prince, Aderinto, Pelumi Victor, Yusuf, Ibrahim Salihu, Pretorius, Arnu

arXiv.org Artificial Intelligence

Accurate retrieval of vegetation biophysical variables from satellite imagery is crucial for ecosystem monitoring and agricultural management. In this work, we propose a physics-informed Transformer-VAE architecture to invert the PROSAIL radiative transfer model for simultaneous estimation of key canopy parameters from Sentinel-2 data. Unlike previous hybrid approaches that require real satellite images for self-supevised training. Our model is trained exclusively on simulated data, yet achieves performance on par with state-of-the-art methods that utilize real imagery. The Transformer-VAE incorporates the PROSAIL model as a differentiable physical decoder, ensuring that inferred latent variables correspond to physically plausible leaf and canopy properties. We demonstrate retrieval of leaf area index (LAI) and canopy chlorophyll content (CCC) on real-world field datasets (FRM4Veg and BelSAR) with accuracy comparable to models trained with real Sentinel-2 data. Our method requires no in-situ labels or calibration on real images, offering a cost-effective and self-supervised solution for global vegetation monitoring. The proposed approach illustrates how integrating physical models with advanced deep networks can improve the inversion of RTMs, opening new prospects for large-scale, physically-constrained remote sensing of vegetation traits.


Layer-Aware Influence for Online Data Valuation Estimation

Yang, Ziao, Huang, Longbo, Liu, Hongfu

arXiv.org Artificial Intelligence

Data-centric learning emphasizes curating high-quality training samples to boost performance rather than designing new architectures. A central problem is to estimate the influence of training sample efficiently. Prior studies largely focus on static influence measured on a converged model, overlooking how data valuation dynamically changes during optimization. This omission neglects the dynamic nature of sample influence during optimization, especially in deep models. To address the computational burden of frequent influence estimation, we develop a layer-aware online estimator that requires only loss-to-output gradients. This design avoids parameter-level and full-network gradients while preserving ranking fidelity. Extensive experiments across LLM pretraining, fine-tuning, and image classification show our method improves accuracy with substantially lower time and memory cost, making dynamic data curation efficient and scalable in practice.


Unsupervised learning for anticipating critical transitions

Panahi, Shirin, Kong, Ling-Wei, Glaz, Bryan, Haile, Mulugeta, Lai, Ying-Cheng

arXiv.org Artificial Intelligence

For anticipating critical transitions in complex dynamical systems, the recent approach of parameter-driven reservoir computing requires explicit knowledge of the bifurcation parameter. We articulate a framework combining a variational autoencoder (VAE) and reservoir computing to address this challenge. In particular, the driving factor is detected from time series using the VAE in an unsupervised-learning fashion and the extracted information is then used as the parameter input to the reservoir computer for anticipating the critical transition. We demonstrate the power of the unsupervised learning scheme using prototypical dynamical systems including the spatiotemporal Kuramoto-Sivashinsky system. The scheme can also be extended to scenarios where the target system is driven by several independent parameters or with partial state observations.


On Lai's Upper Confidence Bound in Multi-Armed Bandits

Ren, Huachen, Zhang, Cun-Hui

arXiv.org Machine Learning

In this memorial paper, we honor Tze Leung Lai's seminal contributions to the topic of multi-armed bandits, with a specific focus on his pioneering work on the upper confidence bound. We establish sharp non-asymptotic regret bounds for an upper confidence bound index with a constant level of exploration for Gaussian rewards. Furthermore, we establish a non-asymptotic regret bound for the upper confidence bound index of Lai (1987) which employs an exploration function that decreases with the sample size of the corresponding arm. The regret bounds have leading constants that match the Lai-Robbins lower bound. Our results highlight an aspect of Lai's seminal works that deserves more attention in the machine learning literature.


Data-driven model discovery with Kolmogorov-Arnold networks

Moradi, Mohammadamin, Panahi, Shirin, Bollt, Erik M., Lai, Ying-Cheng

arXiv.org Artificial Intelligence

Department of Physics, Arizona State University, Tempe, Arizona 85287, USA (Dated: September 24, 2024) Data-driven model discovery of complex dynamical systems is typically done using sparse optimization, but it has a fundamental limitation: sparsity in that the underlying governing equations of the system contain only a small number of elementary mathematical terms. Examples where sparse optimization fails abound, such as the classic Ikeda or optical-cavity map in nonlinear dynamics and a large variety of ecosystems. Exploiting the recently articulated Kolmogorov-Arnold networks, we develop a general model-discovery framework for any dynamical systems including those that do not satisfy the sparsity condition. In particular, we demonstrate non-uniqueness in that a large number of approximate models of the system can be found which generate the same invariant set with the correct statistics such as the Lyapunov exponents and Kullback-Leibler divergence. An analogy to shadowing of numerical trajectories in chaotic systems is pointed out.


Convergence Analysis of Flow Matching in Latent Space with Transformers

Jiao, Yuling, Lai, Yanming, Wang, Yang, Yan, Bokai

arXiv.org Machine Learning

We use a pre-trained autoencoder network to map high-dimensional original inputs to a low-dimensional latent space, where a transformer network is trained to predict the velocity field of the transformation from a standard normal distribution to the target latent distribution. Our error analysis demonstrates the effectiveness of this approach, showing that the distribution of samples generated via estimated ODE flow converges to the target distribution in the Wasserstein-2 distance under mild and practical assumptions. Furthermore, we show that arbitrary smooth functions can be effectively approximated by transformer networks with Lipschitz continuity, which may be of independent interest.


Anytime-valid t-tests and confidence sequences for Gaussian means with unknown variance

Wang, Hongjian, Ramdas, Aaditya

arXiv.org Machine Learning

In 1976, Lai constructed a nontrivial confidence sequence for the mean $\mu$ of a Gaussian distribution with unknown variance $\sigma$. Curiously, he employed both an improper (right Haar) mixture over $\sigma$ and an improper (flat) mixture over $\mu$. Here, we elaborate carefully on the details of his construction, which use generalized nonintegrable martingales and an extended Ville's inequality. While this does yield a sequential t-test, it does not yield an ``e-process'' (due to the nonintegrability of his martingale). In this paper, we develop two new e-processes and confidence sequences for the same setting: one is a test martingale in a reduced filtration, while the other is an e-process in the canonical data filtration. These are respectively obtained by swapping Lai's flat mixture for a Gaussian mixture, and swapping the right Haar mixture over $\sigma$ with the maximum likelihood estimate under the null, as done in universal inference. We also analyze the width of resulting confidence sequences, which have a curious dependence on the error probability $\alpha$. Numerical experiments are provided along the way to compare and contrast the various approaches.


Federated Learning Uses The Data Right on Our Devices

#artificialintelligence

An approach called federated learning trains machine learning models on devices like smartphones and laptops, rather than requiring the transfer of private data to central servers. The biggest benchmarking data set to date for a machine learning technique designed with data privacy in mind is now available open source. "By training in-situ on data where it is generated, we can train on larger real-world data," explains Fan Lai, a doctoral student in computer science and engineering at the University of Michigan, who presents the FedScale training environment at the International Conference on Machine Learning this week. A paper on the work is available on ArXiv. "This also allows us to mitigate privacy risks and high communication and storage costs associated with collecting the raw data from end-user devices into the cloud," Lai says.


sbp-env: Sampling-based Motion Planners' Testing Environment

Lai, Tin

arXiv.org Artificial Intelligence

Sampling-based motion planners' testing environment (sbp-env) is a full feature framework to quickly test different sampling-based algorithms for motion planning. sbp-env focuses on the flexibility of tinkering with different aspects of the framework, and had divided the main planning components into two categories (i) samplers and (ii) planners. The focus of motion planning research had been mainly on (i) improving the sampling efficiency (with methods such as heuristic or learned distribution) and (ii) the algorithmic aspect of the planner using different routines to build a connected graph. Therefore, by separating the two components one can quickly swap out different components to test novel ideas.