lagrangian
- Asia > South Korea (0.14)
- Asia > Middle East > Jordan (0.05)
- Europe > Sweden > Västerbotten County > Umeå (0.04)
- (4 more...)
Federated Learning With L0 Constraint Via Probabilistic Gates For Sparsity
Huthasana, Krishna Harsha Kovelakuntla, Olama, Alireza, Lundell, Andreas
Federated Learning (FL) is a distributed machine learning setting that requires multiple clients to collaborate on training a model while maintaining data privacy. The unaddressed inherent sparsity in data and models often results in overly dense models and poor generalizability under data and client participation heterogeneity. We propose FL with an L0 constraint on the density of non-zero parameters, achieved through a reparameterization using probabilistic gates and their continuous relaxation: originally proposed for sparsity in centralized machine learning. We show that the objective for L0 constrained stochastic minimization naturally arises from an entropy maximization problem of the stochastic gates and propose an algorithm based on federated stochastic gradient descent for distributed learning. We demonstrate that the target density (rho) of parameters can be achieved in FL, under data and client participation heterogeneity, with minimal loss in statistical performance for linear and non-linear models: Linear regression (LR), Logistic regression (LG), Softmax multi-class classification (MC), Multi-label classification with logistic units (MLC), Convolution Neural Network (CNN) for multi-class classification (MC). We compare the results with a magnitude pruning-based thresholding algorithm for sparsity in FL. Experiments on synthetic data with target density down to rho = 0.05 and publicly available RCV1, MNIST, and EMNIST datasets with target density down to rho = 0.005 demonstrate that our approach is communication-efficient and consistently better in statistical performance.
- North America > United States > Virginia (0.04)
- Europe > Finland (0.04)
- Asia (0.04)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Middle East > Jordan (0.04)
Lipschitz Bounds and Provably Robust Training by Laplacian Smoothing
In this work we propose a graph-based learning framework to train models with provable robustness to adversarial perturbations. In contrast to regularization-based approaches, we formulate the adversarially robust learning problem as one of loss minimization with a Lipschitz constraint, and show that the saddle point of the associated Lagrangian is characterized by a Poisson equation with weighted Laplace operator. Further, the weighting for the Laplace operator is given by the Lagrange multiplier for the Lipschitz constraint, which modulates the sensitivity of the minimizer to perturbations.
Semantic Faithfulness and Entropy Production Measures to Tame Your LLM Demons and Manage Hallucinations
Evaluating faithfulness of Large Language Models (LLMs) to a given task is a complex challenge. We propose two new unsupervised metrics for faithfulness evaluation using insights from information theory and thermodynamics. Our approach treats an LLM as a bipartite information engine where hidden layers act as a Maxwell demon controlling transformations of context $C $ into answer $A$ via prompt $Q$. We model Question-Context-Answer (QCA) triplets as probability distributions over shared topics. Topic transformations from $C$ to $Q$ and $A$ are modeled as transition matrices ${\bf Q}$ and ${\bf A}$ encoding the query goal and actual result, respectively. Our semantic faithfulness (SF) metric quantifies faithfulness for any given QCA triplet by the Kullback-Leibler (KL) divergence between these matrices. Both matrices are inferred simultaneously via convex optimization of this KL divergence, and the final SF metric is obtained by mapping the minimal divergence onto the unit interval [0,1], where higher scores indicate greater faithfulness. Furthermore, we propose a thermodynamics-based semantic entropy production (SEP) metric in answer generation, and show that high faithfulness generally implies low entropy production. The SF and SEP metrics can be used jointly or separately for LLM evaluation and hallucination control. We demonstrate our framework on LLM summarization of corporate SEC 10-K filings.
- Information Technology (0.68)
- Banking & Finance (0.50)
- Government (0.46)
A Theoretical Framework for Auxiliary-Loss-Free Load Balancing of Sparse Mixture-of-Experts in Large-Scale AI Models
In large-scale AI training, Sparse Mixture-of-Experts (s-MoE) layers enable scaling by activating only a small subset of experts per token. An operational challenge in this design is load balancing: routing tokens to minimize the number of idle experts, which is important for the efficient utilization of (costly) GPUs. We provide a theoretical framework for analyzing the Auxiliary-Loss-Free Load Balancing (ALF-LB) procedure -- proposed by DeepSeek's Wang et al. (2024) -- by casting it as a one-step-per-iteration primal-dual method for an assignment problem. First, in a stylized deterministic setting, our framework yields several insightful structural properties: (i) a monotonic improvement of a Lagrangian objective, (ii) a preference rule that moves tokens from overloaded to underloaded experts, and (iii) an approximate-balancing guarantee. Then, we incorporate the stochastic and dynamic nature of AI training using a generalized online optimization formulation. In the online setting, we derive a strong convexity property of the objective that leads to a logarithmic expected regret bound under certain step-size choices. Additionally, we present real experiments on 1B-parameter DeepSeekMoE models to complement our theoretical findings. Together, these results build a principled framework for analyzing the Auxiliary-Loss-Free Load Balancing of s-MoE in AI models.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Jordan (0.04)
Robust Precoding for Resilient Cell-Free Networks
Mashdour, Saeed, Flores, André R., de Lamare, Rodrigo C.
Abstract--This paper presents a robust precoder design for resilient cell-free massive MIMO (CF-mMIMO) systems that minimizes the weighted sum of desired signal mean square error (MSE) and residual interference leakage power under a total transmit power constraint. The proposed robust preco der incorporates channel state information (CSI) error statis tics to enhance resilience against CSI imperfections. We employ an alternating optimization algorithm initialized with a min imum MSE-type solution, which iteratively refines the precoder w hile maintaining low computational complexity and ensuring fas t convergence. Numerical results show that the proposed meth od significantly outperforms conventional linear precoders, providing an effective balance between performance and computati onal efficiency. Cell-free massive multiple-input multiple-output (CF-mMIMO) networks have emerged as an extension of massive multiple-input multiple-output (MIMO) systems [1], [2] an d cornerstone of next-generation wireless systems by deploy ing a large number of distributed access points (APs) to jointly serve users without cell boundaries [3], [4], [5].
- North America > United States > Georgia > Fulton County > Atlanta (0.04)
- North America > United States > Connecticut > Hartford County > Hartford (0.04)
- North America > United States > Connecticut > Hartford County > East Hartford (0.04)
- (3 more...)
- Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Hessians in Birkhoff-Theoretic Trajectory Optimization
This paper derives various Hessians associated with Birkhoff-theoretic methods for trajectory optimization. According to a theorem proved in this paper, approximately 80% of the eigenvalues are contained in the narrow interval [-2, 4] for all Birkhoff-discretized optimal control problems. A preliminary analysis of computational complexity is also presented with further discussions on the grand challenge of solving a million point trajectory optimization problem.
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- (2 more...)
- Government > Regional Government > North America Government > United States Government (0.64)
- Government > Military > Navy (0.40)