Goto

Collaborating Authors

 dimensional


Learning Chaotic Dynamics in Dissipative Systems

Neural Information Processing Systems

Chaotic systems are notoriously challenging to predict because of their sensitivity to perturbations and errors due to time stepping. Despite this unpredictable behavior, for many dissipative systems the statistics of the long term trajectories are governed by an invariant measure supported on a set, known as the global attractor; for many problems this set is finite dimensional, even if the state space is infinite dimensional. For Markovian systems, the statistical properties of long-term trajectories are uniquely determined by the solution operator that maps the evolution of the system over arbitrary positive time increments. In this work, we propose a machine learning framework to learn the underlying solution operator for dissipative chaotic systems, showing that the resulting learned operator accurately captures short-time trajectories and long-time statistical behavior. Using this framework, we are able to predict various statistics of the invariant measure for the turbulent Kolmogorov Flow dynamics with Reynolds numbers up to $5000$.


On the Saturation Effects of Spectral Algorithms in Large Dimensions

Neural Information Processing Systems

The saturation effects, which originally refer to the fact that kernel ridge regression (KRR) fails to achieve the information-theoretical lower bound when the regression function is over-smooth, have been observed for almost 20 years and were rigorously proved recently for kernel ridge regression and some other spectral algorithms over a fixed dimensional domain. The main focus of this paper is to explore the saturation effects for a large class of spectral algorithms (including the KRR, gradient descent, etc.) in large dimensional settings where $n \asymp d^{\gamma}$. More precisely, we first propose an improved minimax lower bound for the kernel regression problem in large dimensional settings and show that the gradient flow with early stopping strategy will result in an estimator achieving this lower bound (up to a logarithmic factor). Similar to the results in KRR, we can further determine the exact convergence rates (both upper and lower bounds) of a large class of (optimal tuned) spectral algorithms with different qualification $\tau$'s. In particular, we find that these exact rate curves (varying along $\gamma$) exhibit the periodic plateau behavior and the polynomial approximation barrier. Consequently, we can fully depict the saturation effects of the spectral algorithms and reveal a new phenomenon in large dimensional settings (i.e., the saturation effect occurs in large dimensional setting as long as the source condition $s> \tau$ while it occurs in fixed dimensional setting as long as $s> 2\tau$).


Kernel similarity matching with Hebbian networks

Neural Information Processing Systems

Recent works have derived neural networks with online correlation-based learning rules to perform \textit{kernel similarity matching}. These works applied existing linear similarity matching algorithms to nonlinear features generated with random Fourier methods. In this paper attempt to perform kernel similarity matching by directly learning the nonlinear features. Our algorithm proceeds by deriving and then minimizing an upper bound for the sum of squared errors between output and input kernel similarities. The construction of our upper bound leads to online correlation-based learning rules which can be implemented with a 1 layer recurrent neural network. In addition to generating high-dimensional linearly separable representations, we show that our upper bound naturally yields representations which are sparse and selective for specific input patterns. We compare the approximation quality of our method to neural random Fourier method and variants of the popular but non-biological ``Nystr{\o}m'' method for approximating the kernel matrix. Our method appears to be comparable or better than randomly sampled Nystr{\o}m methods when the outputs are relatively low dimensional (although still potentially higher dimensional than the inputs) but less faithful when the outputs are very high dimensional.


Evaluating Embedding Models and Pipeline Optimization for AI Search Quality

Zhong, Philip, Chen, Kent, Wang, Don

arXiv.org Artificial Intelligence

We evaluate the performance of various text embedding models and pipeline configurations for AI-driven search systems. We compare sentence-transformer and generative embedding models (e.g., All-MPNet, BGE, GTE, and Qwen) at different dimensions, indexing methods (Milvus HNSW/IVF), and chunking strategies. A custom evaluation dataset of 11,975 query-chunk pairs was synthesized from US City Council meeting transcripts using a local large language model (LLM). The data pipeline includes preprocessing, automated question generation per chunk, manual validation, and continuous integration/continuous deployment (CI/CD) integration. We measure retrieval accuracy using reference-based metrics: Top-K Accuracy and Normalized Discounted Cumulative Gain (NDCG). Our results demonstrate that higher-dimensional embeddings significantly boost search quality (e.g., Qwen3-Embedding-8B/4096 achieves Top-3 accuracy about 0.571 versus 0.412 for GTE-large/1024), and that neural re-rankers (e.g., a BGE cross-encoder) further improve ranking accuracy (Top-3 up to 0.527). Finer-grained chunking (512 characters versus 2000 characters) also improves accuracy. We discuss the impact of these factors and outline future directions for pipeline automation and evaluation.


Supplementary Material

Neural Information Processing Systems

Graphical representation of the DNN based multi-fidelity surrogate model. In the experiments, we used three synthetic benchmark tasks to evaluate our method. The global maximum is -0.3979 at ( π, 12 .275) The materials are parameterized by three properties, Y oung's modulus (in To compute the frequency, we discretize the plate with quadratic tetrahedral elements (see Figure 1). The plate discretized with quadratic tetrahedral elements (the maximum mesh edge length is 1 .



Figure 1 Projecting 50 dimensional obtained by training a simple neural network without SSE Left and

Neural Information Processing Systems

We thank the reviewers for their insightful feedback. In the following, we address their concerns and questions. It is indeed a great suggestion to examine concrete examples beyond the quantitative evaluation to get an intuition. That is likely due to the use of item graph. As shown in Theorem 1, SSE can'smooth' the Rademacher SSE-SE and perhaps we can further study how this is related to dropout in theory.



A Position Encodings Suppose we sample a sequence of positions x

Neural Information Processing Systems

We used the below encodings as different regression targets. What qualifies as a grid cell? "grid score", which functions by binning neural activity into rate maps using spatial position, applying an adaptive smoother, then taking a circular sample of the autocorrelation centered on the central peak What score is sufficient to qualify as a grid score? Experimentalists have used thresholds of 0.3 [ The first step in computing grid scores is determining the number of bins to use to compute rate maps. We considered three grid score thresholds: 0.3 (used by some experimentalists), 0.8 (low In this section, we will derive the form of the place cell correlation function.