Gilani, Ammar
Fundamental Limits of Prompt Tuning Transformers: Universality, Capacity and Efficiency
Hu, Jerry Yao-Chieh, Wang, Wei-Po, Gilani, Ammar, Li, Chenyang, Song, Zhao, Liu, Han
We investigate the statistical and computational limits of prompt tuning for transformer-based foundation models. Our key contributions are prompt tuning on \textit{single-head} transformers with only a \textit{single} self-attention layer: (i) is universal, and (ii) supports efficient (even almost-linear time) algorithms under the Strong Exponential Time Hypothesis (SETH). Statistically, we prove that prompt tuning on such simplest possible transformers are universal approximators for sequence-to-sequence Lipschitz functions. In addition, we provide an exponential-in-$dL$ and -in-$(1/\epsilon)$ lower bound on the required soft-prompt tokens for prompt tuning to memorize any dataset with 1-layer, 1-head transformers. Computationally, we identify a phase transition in the efficiency of prompt tuning, determined by the norm of the \textit{soft-prompt-induced} keys and queries, and provide an upper bound criterion. Beyond this criterion, no sub-quadratic (efficient) algorithm for prompt tuning exists under SETH. Within this criterion, we showcase our theory by proving the existence of almost-linear time prompt tuning inference algorithms. These fundamental limits provide important necessary conditions for designing expressive and efficient prompt tuning methods for practitioners.
BiSHop: Bi-Directional Cellular Learning for Tabular Data with Generalized Sparse Modern Hopfield Model
Xu, Chenwei, Huang, Yu-Chao, Hu, Jerry Yao-Chieh, Li, Weijian, Gilani, Ammar, Goan, Hsi-Sheng, Liu, Han
The field of developing deep learning architectures for tabular data is recently experiencing rapid advancements [Arik and Pfister, 2021, Gorishniy et al., 2021, Huang et al., 2020, Somepalli et al., 2021]. The primary driving force behind this trend is the limitations of the current dominant methods for tabular data: tree-based methods. Specifically, while tree-based methods excel in tabular learning, tree-based methods lack the capability to integrate with deep learning architectures. Therefore, the pursuit of deep tabular learning is not just a matter of enhancing performance but is also crucial to bridge the existing gap. However, a recent tabular benchmark study [Grinsztajn et al., 2022] reveals that tree-based methods still surpass deep learning models, underscoring two main challenges for deep tabular learning, as highlighted by Grinsztajn et al. [2022, Section 5.3 & 5.4]: (C1) Non-Rotationally Invariant Data Structure: The non-rotationally invariant structure of tabular data weakens the effectiveness of deep learning models that have rotational invariant learning procedures.
CoRMF: Criticality-Ordered Recurrent Mean Field Ising Solver
Pan, Zhenyu, Gilani, Ammar, Kuo, En-Jui, Liu, Zhuo
We propose an RNN-based efficient Ising model solver, the Criticality-ordered Recurrent Mean On one hand, the connection between NP problems and Field (CoRMF), for forward Ising problems. In Ising models has resulted in strong physics intuitions [Kirkpatrick its core, a criticality-ordered spin sequence of et al., 1983] that the hardness of these problems an N-spin Ising model is introduced by sorting emerges through the lens of complex energy landscapes mission-critical edges with greedy algorithm, over discrete random variables with multiple local minima such that an autoregressive mean-field factorization [Barahona, 1982, Chowdhury, 2014]. On the other hand, can be utilized and optimized with Recurrent the computational difficulty on the Ising side resonates with Neural Networks (RNNs). Our method the difficulties of numerous significant scientific problems, has two notable characteristics: (i) by leveraging including numerous other combinatorial decision-making the approximated tree structure of the underlying and optimization problems [Benati and Rizzi, 2007, Ngo Ising graph, the newly-obtained criticality et al., 1994, Garey and Johnson, 1979]. As the opposite order enables the unification between variational of conventional inverse Ising problems [Nguyen et al., mean-field and RNN, allowing the generally 2017, Reneau et al., 2023] that reconstruct graphical structure intractable Ising model to be efficiently from data, we refer to these problems, which have probed with probabilistic inference; (ii) it is wellmodulized, pre-specified graphical structures, as forward Ising problems model-independent while at the same (combinatorial inference and optimization problems time expressive enough, and hence fully applicable in Ising formulations [De las Cuevas and Cubitt, 2016, Lucas, to any forward Ising inference problems 2014, Pan et al., 2023]), and any efficient computational with minimal effort. Computationally, by using method or hardware solver [Mohseni et al., 2022] a variance-reduced Monte Carlo gradient estimator, for Ising models can potentially benefit them. CoRFM solves the Ising problems in a selftrain To describe the Ising model, we first introduce some notation fashion without data/evidence, and the inference here. We consider an Ising model of N spins as an exponential family model for binary N-spin data up to tasks can be executed by directly sampling quadratic sufficient statistic taking the Boltzmann form from RNN. Theoretically, we establish a provably tighter error bound than naive meanfield
Feature Programming for Multivariate Time Series Prediction
Reneau, Alex, Hu, Jerry Yao-Chieh, Xu, Chenwei, Li, Weijian, Gilani, Ammar, Liu, Han
We introduce the concept of programmable feature Our key motivation comes from a novel dynamical Ising-like engineering for time series modeling and propose model, the spin-gas Glauber dynamics, originated from a a feature programming framework. This newly debuted gas-like interaction that includes momentum framework generates large amounts of predictive and acceleration information. By using spin-gas Glauber features for noisy multivariate time series while dynamics as the fundamental model for time series generating allowing users to incorporate their inductive bias processes at the smallest time scale, we explore the with minimal effort. The key motivation of our potential of treating time series as the path-sum of infinitesimal framework is to view any multivariate time series increments generated by a series of Markovian coin as a cumulative sum of fine-grained trajectory tosses following the spin-gas Glauber dynamics. From such increments, with each increment governed by a a fine-grained perspective, a set of operators is motivated for novel spin-gas dynamical Ising model.
Unsupervised Hypergraph Feature Selection via a Novel Point-Weighting Framework and Low-Rank Representation
Gilani, Ammar, Amirmazlaghani, Maryam
Feature selection methods are widely used in order to solve the 'curse of dimensionality' problem. Many proposed feature selection frameworks, treat all data points equally; neglecting their different representation power and importance. In this paper, we propose an unsupervised hypergraph feature selection method via a novel point-weighting framework and low-rank representation that captures the importance of different data points. We introduce a novel soft hypergraph with low complexity to model data. Then, we formulate the feature selection as an optimization problem to preserve local relationships and also global structure of data. Our approach for global structure preservation helps the framework overcome the problem of unavailability of data labels in unsupervised learning. The proposed feature selection method treats with different data points based on their importance in defining data structure and representation power. Moreover, since the robustness of feature selection methods against noise and outlier is of great importance, we adopt low-rank representation in our model. Also, we provide an efficient algorithm to solve the proposed optimization problem. The computational cost of the proposed algorithm is lower than many state-of-the-art methods which is of high importance in feature selection tasks. We conducted comprehensive experiments with various evaluation methods on different benchmark data sets. These experiments indicate significant improvement, compared with state-of-the-art feature selection methods.