lag
- North America > United States (0.29)
- North America > Canada (0.16)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Transportation > Infrastructure & Services (0.31)
- Transportation > Ground > Road (0.31)
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- (11 more...)
- North America > United States > California > Santa Clara County > Palo Alto (0.05)
- North America > Canada (0.05)
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning
This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation. Simple rules are designed to detect slowly-varying gradients and, therefore, trigger the reuse of outdated gradients. The resultant gradient-based algorithms are termed Lazily Aggregated Gradient --- justifying our acronym LAG used henceforth. Theoretically, the merits of this contribution are: i) the convergence rate is the same as batch gradient descent in strongly-convex, convex, and nonconvex cases; and, ii) if the distributed datasets are heterogeneous (quantified by certain measurable constants), the communication rounds needed to achieve a targeted accuracy are reduced thanks to the adaptive reuse of lagged gradients. Numerical experiments on both synthetic and real data corroborate a significant communication reduction compared to alternatives.
LAG: Lazily Aggregated Gradient for Communication-Efficient Distributed Learning
Tianyi Chen, Georgios Giannakis, Tao Sun, Wotao Yin
This paper presents a new class of gradient methods for distributed machine learning that adaptively skip the gradient calculations to learn with reduced communication and computation. Simple rules are designed to detect slowly-varying gradients and, therefore, trigger the reuse of outdated gradients. The resultant gradient-based algorithms are termed Lazily A ggregated G radient -- justifying our acronym LAG used henceforth. Theoretically, the merits of this contribution are: i) the convergence rate is the same as batch gradient descent in strongly-convex, convex, and nonconvex cases; and, ii) if the distributed datasets are heterogeneous (quantified by certain measurable constants), the communication rounds needed to achieve a targeted accuracy are reduced thanks to the adaptive reuse of lagged gradients. Numerical experiments on both synthetic and real data corroborate a significant communication reduction compared to alternatives.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > Canada > Quebec > Montreal (0.05)
- (11 more...)
- North America > United States (0.29)
- North America > Canada (0.16)
- North America > Trinidad and Tobago > Trinidad > Arima > Arima (0.04)
- Asia > China > Liaoning Province > Shenyang (0.04)
- Transportation > Infrastructure & Services (0.31)
- Transportation > Ground > Road (0.31)
Forecasting the Ionosphere from Sparse GNSS Data with Temporal-Fusion Transformers
Acciarini, Giacomo, Mestici, Simone, Kelebek, Halil, Wolniewicz, Linnea, Vergalla, Michael, Guhathakurta, Madhulika, Rebbapragada, Umaa, Poduval, Bala, Baydin, Atılım Güneş, Soboczenski, Frank
The ionosphere critically influences Global Navigation Satellite Systems (GNSS), satellite communications, and Low Earth Orbit (LEO) operations, yet accurate prediction of its variability remains challenging due to nonlinear couplings between solar, geomagnetic, and thermospheric drivers. Total Electron Content (TEC), a key ionospheric parameter, is derived from GNSS observations, but its reliable forecasting is limited by the sparse nature of global measurements and the limited accuracy of empirical models, especially during strong space weather conditions. In this work, we present a machine learning framework for ionospheric TEC forecasting that leverages Temporal Fusion Transformers (TFT) to predict sparse ionosphere data. Our approach accommodates heterogeneous input sources, including solar irradiance, geomagnetic indices, and GNSS-derived vertical TEC, and applies preprocessing and temporal alignment strategies. Experiments spanning 2010-2025 demonstrate that the model achieves robust predictions up to 24 hours ahead, with root mean square errors as low as 3.33 TECU. Results highlight that solar EUV irradiance provides the strongest predictive signals. Beyond forecasting accuracy, the framework offers interpretability through attention-based analysis, supporting both operational applications and scientific discovery. To encourage reproducibility and community-driven development, we release the full implementation as the open-source toolkit \texttt{ionopy}.
- North America > United States (1.00)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
- Government > Space Agency (0.69)
- Government > Regional Government > North America Government > United States Government (0.69)
Selective Induction Heads: How Transformers Select Causal Structures In Context
D'Angelo, Francesco, Croce, Francesco, Flammarion, Nicolas
Transformers have exhibited exceptional capabilities in sequence modeling tasks, leveraging self-attention and in-context learning. Critical to this success are induction heads, attention circuits that enable copying tokens based on their previous occurrences. In this work, we introduce a novel framework that showcases transformers' ability to dynamically handle causal structures. Existing works rely on Markov Chains to study the formation of induction heads, revealing how transformers capture causal dependencies and learn transition probabilities in-context. However, they rely on a fixed causal structure that fails to capture the complexity of natural languages, where the relationship between tokens dynamically changes with context. To this end, our framework varies the causal structure through interleaved Markov chains with different lags while keeping the transition probabilities fixed. This setting unveils the formation of Selective Induction Heads, a new circuit that endows transformers with the ability to select the correct causal structure in-context. We empirically demonstrate that transformers learn this mechanism to predict the next token by identifying the correct lag and copying the corresponding token from the past. We provide a detailed construction of a 3-layer transformer to implement the selective induction head, and a theoretical analysis proving that this mechanism asymptotically converges to the maximum likelihood solution. Our findings advance the understanding of how transformers select causal structures, providing new insights into their functioning and interpretability.
- Europe > Switzerland > Vaud > Lausanne (0.40)
- North America > Canada > Alberta > Census Division No. 15 > Improvement District No. 9 > Banff (0.04)
LoRA-Augmented Generation (LAG) for Knowledge-Intensive Language Tasks
Fleshman, William, Van Durme, Benjamin
The proliferation of fine-tuned language model experts for specific tasks and domains signals the need for efficient selection and combination methods. We propose LoRA-Augmented Generation (LAG) for leveraging large libraries of knowledge and task-specific LoRA adapters. LAG requires no additional training or access to data, and efficiently filters, retrieves, and applies experts on a per-token and layer basis. We evaluate LAG on various knowledge-intensive tasks, achieving superior performance over existing data-free methods. We explore scenarios where additional data is available, demonstrating LAG's compatibility with alternative solutions such as retrieval-augmented generation (RAG).
- North America > Dominican Republic (0.04)
- Europe > Croatia > Dubrovnik-Neretva County > Dubrovnik (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- (8 more...)
Bayesian Models for Joint Selection of Features and Auto-Regressive Lags: Theory and Applications in Environmental and Financial Forecasting
Manna, Alokesh, Ghosh, Sujit K.
We develop a Bayesian framework for variable selection in linear regression with autocorrelated errors, accommodating lagged covariates and autoregressive structures. This setting occurs in time series applications where responses depend on contemporaneous or past explanatory variables and persistent stochastic shocks, including financial modeling, hydrological forecasting, and meteorological applications requiring temporal dependency capture. Our methodology uses hierarchical Bayesian models with spike-and-slab priors to simultaneously select relevant covariates and lagged error terms. We propose an efficient two-stage MCMC algorithm separating sampling of variable inclusion indicators and model parameters to address high-dimensional computational challenges. Theoretical analysis establishes posterior selection consistency under mild conditions, even when candidate predictors grow exponentially with sample size, common in modern time series with many potential lagged variables. Through simulations and real applications (groundwater depth prediction, S&P 500 log returns modeling), we demonstrate substantial gains in variable selection accuracy and predictive performance. Compared to existing methods, our framework achieves lower MSPE, improved true model component identification, and greater robustness with autocorrelated noise, underscoring practical utility for model interpretation and forecasting in autoregressive settings.
- North America > United States > Connecticut (0.04)
- North America > United States > Texas (0.04)
- North America > United States > South Carolina (0.04)
- North America > United States > North Carolina (0.04)
- Retail (1.00)
- Information Technology > Services (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (4 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)