input length
- Research Report > New Finding (1.00)
- Research Report > Promising Solution (0.67)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.05)
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- (10 more...)
- Law (0.67)
- Health & Medicine > Therapeutic Area (0.46)
- Government > Military (0.46)
On the Provable Generalization of Recurrent Neural Networks
Recurrent Neural Network (RNN) is a fundamental structure in deep learning. Recently, some works study the training process of over-parameterized neural networks, and show that over-parameterized networks can learn functions in some notable concept classes with a provable generalization error bound. In this paper, we analyze the training and generalization for RNNs with random initialization, and provide the following improvements over recent works:(1) For a RNN with input sequence $x=(X_1,X_2,...,X_L)$, previous works study to learn functions that are summation of $f(\beta^T_lX_l)$ and require normalized conditions that $||X_l||\leq\epsilon$ with some very small $\epsilon$ depending on the complexity of $f$. In this paper, using detailed analysis about the neural tangent kernel matrix, we prove a generalization error bound to learn such functions without normalized conditions and show that some notable concept classes are learnable with the numbers of iterations and samples scaling almost-polynomially in the input length $L$.(2) Moreover, we prove a novel result to learn N-variables functions of input sequence with the form $f(\beta^T[X_{l_1},...,X_{l_N}])$, which do not belong to the ``additive'' concept class, i,e., the summation of function $f(X_l)$. And we show that when either $N$ or $l_0=\max(l_1,..,l_N)-\min(l_1,..,l_N)$ is small, $f(\beta^T[X_{l_1},...,X_{l_N}])$ will be learnable with the number iterations and samples scaling almost-polynomially in the input length $L$.
SlimInfer: Accelerating Long-Context LLM Inference via Dynamic Token Pruning
Long, Lingkun, Yang, Rubing, Huang, Yushi, Hui, Desheng, Zhou, Ao, Yang, Jianlei
Long-context inference for Large Language Models (LLMs) is heavily limited by high computational demands. While several existing methods optimize attention computation, they still process the full set of hidden states at each layer, limiting overall efficiency. In this work, we propose SlimInfer, an innovative framework that aims to accelerate inference by directly pruning less critical prompt tokens during the forward pass. Our key insight is an information diffusion phenomenon: As information from critical tokens propagates through layers, it becomes distributed across the entire sequence. This diffusion process suggests that LLMs can maintain their semantic integrity when excessive tokens, even including these critical ones, are pruned in hidden states. Motivated by this, SlimInfer introduces a dynamic fine-grained pruning mechanism that accurately removes redundant tokens of hidden state at intermediate layers. This layer-wise pruning naturally enables an asynchronous KV cache manager that prefetches required token blocks without complex predictors, reducing both memory usage and I/O costs. Extensive experiments show that SlimInfer can achieve up to 2.53 time-to-first-token (TTFT) speedup and 1.88 end-to-end latency reduction for LLaMA3.1-8B-Instruct on a single RTX 4090, without sacrificing performance on LongBench.
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Asia > South Korea > Seoul > Seoul (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Vision (0.93)
- Information Technology > Data Science > Data Mining (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)
- Pacific Ocean > North Pacific Ocean > San Francisco Bay (0.04)
- North America > United States > California > San Francisco County > San Francisco (0.04)
- North America > United States > Alabama (0.04)
- (2 more...)
- Health & Medicine (0.68)
- Banking & Finance (0.67)
- Energy > Renewable (0.46)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Data Science > Data Mining (0.83)
- Information Technology > Artificial Intelligence > Natural Language (0.69)
Supplementary Materials: Autoformer: Decomposition Transformers with Auto-Correlation for Long-term Series Forecasting
Autoformer achieves sharp improvement over the state-of-the-art on various forecasting horizons. These results show a 60% average MSE reduction over previous state-of-the-art. We fix the input length of Autoformer as 96. For the ILI dataset without obvious periodicity, the larger factor may bring noises. We fix the forecasting horizon as 48 for ILI and 336 for the others.
- North America > United States > California (0.14)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Europe > Germany (0.04)
AdaptDel: Adaptable Deletion Rate Randomized Smoothing for Certified Robustness
Huang, Zhuoqun, Marchant, Neil G., Ohrimenko, Olga, Rubinstein, Benjamin I. P.
We consider the problem of certified robustness for sequence classification against edit distance perturbations. Naturally occurring inputs of varying lengths (e.g., sentences in natural language processing tasks) present a challenge to current methods that employ fixed-rate deletion mechanisms and lead to suboptimal performance. To this end, we introduce AdaptDel methods with adaptable deletion rates that dynamically adjust based on input properties. We extend the theoretical framework of randomized smoothing to variable-rate deletion, ensuring sound certification with respect to edit distance. We achieve strong empirical results in natural language tasks, observing up to 30 orders of magnitude improvement to median cardinality of the certified region, over state-of-the-art certifications.