global function
TRACE: Learning to Compute on Graphs
Zheng, Ziyang, Zhu, Jiaying, Zhou, Jingyi, Xu, Qiang
Learning to compute, the ability to model the functional behavior of a computational graph, is a fundamental challenge for graph representation learning. Yet, the dominant paradigm is architecturally mismatched for this task. This flawed assumption, central to mainstream message passing neural networks (MPNNs) and their conventional Transformer-based counterparts, prevents models from capturing the position-aware, hierarchical nature of computation. To resolve this, we introduce \textbf{TRACE}, a new paradigm built on an architecturally sound backbone and a principled learning objective. First, TRACE employs a Hierarchical Transformer that mirrors the step-by-step flow of computation, providing a faithful architectural backbone that replaces the flawed permutation-invariant aggregation. Second, we introduce \textbf{function shift learning}, a novel objective that decouples the learning problem. Instead of predicting the complex global function directly, our model is trained to predict only the \textit{function shift}, the discrepancy between the true global function and a simple local approximation that assumes input independence. We validate this paradigm on electronic circuits, one of the most complex and economically critical classes of computational graphs. Across a comprehensive suite of benchmarks, TRACE substantially outperforms all prior architectures. These results demonstrate that our architecturally-aligned backbone and decoupled learning objective form a more robust paradigm for the fundamental challenge of learning to compute on graphs.
Reviews: A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization
This paper studies the condition for absence of spurious optimality. In particular, the authors introduce'global functions' to define the set of continuous functions that admit no spurious local optima (in the sense of sets), and develop some corresponding definitions and propositions for an extending characterization of continuous functions that admit no spurious strict local optima. The authors also apply their theory to l1-norm minimization in tensor decomposition. Pros: In my opinion, the main contribution of this paper is to establish a general math result and apply it to study the absence of spurious optimality for a specific problem. I also find some mathematical discoveries on global functions interesting, which include: -- In section 2, the paper provides two examples to show that: (i).
Distributed Optimization via Kernelized Multi-armed Bandits
-- Multi-armed bandit algorithms provide solutions for sequential decision-making where learning takes place by interacting with the environment. In this work, we model a distributed optimization problem as a multi-agent kernelized multi-armed bandit problem with a heterogeneous reward setting. In this setup, the agents collabo-ratively aim to maximize a global objective function which is an average of local objective functions. The agents can access only bandit feedback (noisy reward) obtained from the associated unknown local function with a small norm in reproducing kernel Hilbert space (RKHS). We present a fully decentralized algorithm, Multi-agent IGP-UCB (MA-IGP-UCB), which achieves a sub-linear regret bound for popular classes for kernels while preserving privacy. It does not necessitate the agents to share their actions, rewards, or estimates of their local function. In the proposed approach, the agents sample their individual local functions in a way that benefits the whole network by utilizing a running consensus to estimate the upper confidence bound on the global function. Furthermore, we propose an extension, Multi-agent Delayed IGP-UCB (MAD-IGP-UCB) algorithm, which reduces the dependence of the regret bound on the number of agents in the network. It provides improved performance by utilizing a delay in the estimation update step at the cost of more communication. HE problem of distributed optimization deals with the optimization of a function over a network of agents in which the whole function is not completely known to any single agent [1], [2]. In fact, the "global" function can be expressed as an average of "local" functions associated with each agent which are independent of one another. In particular, our interest lies in the case when these local functions are non-convex, unknown, and expensive to compute or record. To form a feasible problem, we assume that these local functions belong to a reproducing kernel Hilbert space (RKHS), which is a very common assumption in the literature [3]- [5]. When dealing with unknown functions, the problem for each agent can be broken down into two segments: sampling and optimization.
- North America > United States > Indiana > Tippecanoe County > West Lafayette (0.04)
- North America > United States > Indiana > Tippecanoe County > Lafayette (0.04)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- Africa > South Sudan > Equatoria > Central Equatoria > Juba (0.04)
LS-CAT: A Large-Scale CUDA AutoTuning Dataset
Bjertnes, Lars, Tørring, Jacob O., Elster, Anne C.
The effectiveness of Machine Learning (ML) methods depend on access to large suitable datasets. In this article, we present how we build the LS-CAT (Large-Scale CUDA AutoTuning) dataset sourced from GitHub for the purpose of training NLP-based ML models. Our dataset includes 19 683 CUDA kernels focused on linear algebra. In addition to the CUDA codes, our LS-CAT dataset contains 5 028 536 associated runtimes, with different combinations of kernels, block sizes and matrix sizes. The runtime are GPU benchmarks on both Nvidia GTX 980 and Nvidia T4 systems. This information creates a foundation upon which NLP-based models can find correlations between source-code features and optimal choice of thread block sizes. There are several results that can be drawn out of our LS-CAT database. E.g., our experimental results show that an optimal choice in thread block size can gain an average of 6% for the average case. We thus also analyze how much performance increase can be achieved in general, finding that in 10% of the cases more than 20% performance increase can be achieved by using the optimal block. A description of current and future work is also included.
A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization
Josz, Cedric, Ouyang, Yi, Zhang, Richard, Lavaei, Javad, Sojoudi, Somayeh
We study the set of continuous functions that admit no spurious local optima (i.e. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of non-differentiable nonconvex optimization problems arising in tensor decomposition applications are global functions.
A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization
Josz, Cedric, Ouyang, Yi, Zhang, Richard, Lavaei, Javad, Sojoudi, Somayeh
We study the set of continuous functions that admit no spurious local optima (i.e. local minima that are not global minima) which we term global functions. They satisfy various powerful properties for analyzing nonconvex and nonsmooth optimization problems. For instance, they satisfy a theorem akin to the fundamental uniform limit theorem in the analysis regarding continuous functions. Global functions are also endowed with useful properties regarding the composition of functions and change of variables. Using these new results, we show that a class of non-differentiable nonconvex optimization problems arising in tensor decomposition applications are global functions. This is the first result concerning nonconvex methods for nonsmooth objective functions. Our result provides a theoretical guarantee for the widely-used $\ell_1$ norm to avoid outliers in nonconvex optimization.