Goto

Collaborating Authors

 oom


Theoretical Analysis

Neural Information Processing Systems

A.1 Graph Spectrum Consider an undirected graph G =( V,E)whose adjacency matrix is symmetric. Following notations in Section 2, we denote the eigendecomposition of the normalized graph adjacency and Laplacian matrices respectively as A = UMU > and L = VNV >, where M = diag(ยต1,,ยตn), |ยต1| |ยต2| | ยตn|, N = diag( 1,, n), 0= 1 < 2 n, and U,V are the matrices of corresponding eigenvectors. We also immediately have A2 = UM 2U> = U U>, f = |ยตf |2. Intuitively, since L = I A, the leading eigenvalues ยต1,ยต2, of A correspond to the smallest of those 1, 2, of L. These eigenvalues are known as the low-frequency spectrum of the graph that correlates to graph connectivity. Specially, 2 > 0if and only if the graph is connected, which is our case. Similarly, small values of ยตf and large values of i represent the high frequency part of the graph. Graph spectrum is a graph invariant despite the status of node labels. We plot the spectrum computed by A2Prop of 7 heterophilous graphs and 2 homophilous graphs in Figure 4, which shows the leading k eigenvalues ยต1,ยต2,,ยตk. The figure indicates the importance of high-frequency information in graphs under heterophily. For appropriately large heterophilous graphs penn94, arxiv-year, genius, and twitch-gamers, the spectrum converges slowly to 0. In other words, even high-frequency parts with large f still exhibit relatively large eigenvalues compared to the dominant ยต1. In contrast, the homophilous protein and reddit are significant in low frequency of a few eigenvalues. Hence, during the iterative propagation, the topology information carried by these high-frequency components can be preserved to enhance performance.


Spectral Learning of Dynamic Systems from Nonequilibrium Data

Neural Information Processing Systems

Observable operator models (OOMs) and related models are one of the most important and powerful tools for modeling and analyzing stochastic systems. They exactly describe dynamics of finite-rank systems and can be efficiently and consistently estimated through spectral learning under the assumption of identically distributed data. In this paper, we investigate the properties of spectral learning without this assumption due to the requirements of analyzing large-time scale systems, and show that the equilibrium dynamics of a system can be extracted from nonequilibrium observation data by imposing an equilibrium constraint. In addition, we propose a binless extension of spectral learning for continuous data. In comparison with the other continuous-valued spectral algorithms, the binless algorithm can achieve consistent estimation of equilibrium dynamics with only linear complexity.






This Time is Different: An Observability Perspective on Time Series Foundation Models

arXiv.org Artificial Intelligence

We introduce Toto, a time series forecasting foundation model with 151 million parameters. Toto uses a modern decoder-only architecture coupled with architectural innovations designed to account for specific challenges found in multivariate observability time series data. Toto's pre-training corpus is a mixture of observability data, open datasets, and synthetic data, and is 4-10$\times$ larger than those of leading time series foundation models. Additionally, we introduce BOOM, a large-scale benchmark consisting of 350 million observations across 2,807 real-world time series. For both Toto and BOOM, we source observability data exclusively from Datadog's own telemetry and internal observability metrics. Extensive evaluations demonstrate that Toto achieves state-of-the-art performance on both BOOM and on established general purpose time series forecasting benchmarks. Toto's model weights, inference code, and evaluation scripts, as well as BOOM's data and evaluation code, are all available as open source under the Apache 2.0 License available at https://huggingface.co/Datadog/Toto-Open-Base-1.0 and https://github.com/DataDog/toto.



Trends in Frontier AI Model Count: A Forecast to 2028

arXiv.org Artificial Intelligence

Governments are starting to impose requirements on AI models based on how much compute was used to train them. For example, the EU AI Act imposes requirements on providers of general-purpose AI with systemic risk, which includes systems trained using greater than $10^{25}$ floating point operations (FLOP). In the United States' AI Diffusion Framework, a training compute threshold of $10^{26}$ FLOP is used to identify "controlled models" which face a number of requirements. We explore how many models such training compute thresholds will capture over time. We estimate that by the end of 2028, there will be between 103-306 foundation models exceeding the $10^{25}$ FLOP threshold put forward in the EU AI Act (90% CI), and 45-148 models exceeding the $10^{26}$ FLOP threshold that defines controlled models in the AI Diffusion Framework (90% CI). We also find that the number of models exceeding these absolute compute thresholds each year will increase superlinearly -- that is, each successive year will see more new models captured within the threshold than the year before. Thresholds that are defined with respect to the largest training run to date (for example, such that all models within one order of magnitude of the largest training run to date are captured by the threshold) see a more stable trend, with a median forecast of 14-16 models being captured by this definition annually from 2025-2028.


Rethinking Structure Learning For Graph Neural Networks

arXiv.org Artificial Intelligence

To improve the performance of Graph Neural Networks (GNNs), Graph Structure Learning (GSL) has been extensively applied to reconstruct or refine original graph structures, effectively addressing issues like heterophily, over-squashing, and noisy structures. While GSL is generally thought to improve GNN performance, it often leads to longer training times and more hyperparameter tuning. Besides, the distinctions among current GSL methods remain ambiguous from the perspective of GNN training, and there is a lack of theoretical analysis to quantify their effectiveness. Recent studies further suggest that, under fair comparisons with the same hyperparameter tuning, GSL does not consistently outperform baseline GNNs. This motivates us to ask a critical question: is GSL really useful for GNNs? To address this question, this paper makes two key contributions. First, we propose a new GSL framework, which includes three steps: GSL base (the representation used for GSL) construction, new structure construction, and view fusion, to better understand the effectiveness of GSL in GNNs. Second, after graph convolution, we analyze the differences in mutual information (MI) between node representations derived from the original topology and those from the newly constructed topology. Surprisingly, our empirical observations and theoretical analysis show that no matter which type of graph structure construction methods are used, after feeding the same GSL bases to the newly constructed graph, there is no MI gain compared to the original GSL bases. To fairly reassess the effectiveness of GSL, we conduct ablation experiments and find that it is the pretrained GSL bases that enhance GNN performance, and in most cases, GSL cannot improve GNN performance. This finding encourages us to rethink the essential components in GNNs, such as self-training and structural encoding, in GNN design rather than GSL.