Goto

Collaborating Authors

 gaussian channel


Analyzing the Generalization Capability of SGLD Using Properties of Gaussian Channels

Neural Information Processing Systems

Optimization is a key component for training machine learning models and has a strong impact on their generalization. In this paper, we consider a particular optimization method---the stochastic gradient Langevin dynamics (SGLD) algorithm---and investigate the generalization of models trained by SGLD. We derive a new generalization bound by connecting SGLD with Gaussian channels found in information and communication theory. Our bound can be computed from the training data and incorporates the variance of gradients for quantifying a particular kind of sharpness of the loss landscape. We also consider a closely related algorithm with SGLD, namely differentially private SGD (DP-SGD). We prove that the generalization capability of DP-SGD can be amplified by iteration. Specifically, our bound can be sharpened by including a time-decaying factor if the DP-SGD algorithm outputs the last iterate while keeping other iterates hidden. This decay factor enables the contribution of early iterations to our bound to reduce with time and is established by strong data processing inequalities---a fundamental tool in information theory. We demonstrate our bound through numerical experiments, showing that it can predict the behavior of the true generalization gap.



Analyzing the Generalization Capability of SGLD Using Properties of Gaussian Channels

Neural Information Processing Systems

Optimization is a key component for training machine learning models and has a strong impact on their generalization. In this paper, we consider a particular optimization method--the stochastic gradient Langevin dynamics (SGLD) algorithm--and investigate the generalization of models trained by SGLD.


Adaptive Source-Channel Coding for Semantic Communications

Li, Dongxu, Yuan, Kai, Huang, Jianhao, Huang, Chuan, Qin, Xiaoqi, Cui, Shuguang, Zhang, Ping

arXiv.org Artificial Intelligence

Semantic communications (SemComs) have emerged as a promising paradigm for joint data and task-oriented transmissions, combining the demands for both the bit-accurate delivery and end-to-end (E2E) distortion minimization. However, current joint source-channel coding (JSCC) in SemComs is not compatible with the existing communication systems and cannot adapt to the variations of the sources or the channels, while separate source-channel coding (SSCC) is suboptimal in the finite blocklength regime. To address these issues, we propose an adaptive source-channel coding (ASCC) scheme for SemComs over parallel Gaussian channels, where the deep neural network (DNN)-based semantic source coding and conventional digital channel coding are separately deployed and adaptively designed. To enable efficient adaptation between the source and channel coding, we first approximate the E2E data and semantic distortions as functions of source coding rate and bit error ratio (BER) via logistic regression, where BER is further modeled as functions of signal-to-noise ratio (SNR) and channel coding rate. Then, we formulate the weighted sum E2E distortion minimization problem for joint source-channel coding rate and power allocation over parallel channels, which is solved by the successive convex approximation. Finally, simulation results demonstrate that the proposed ASCC scheme outperforms typical deep JSCC and SSCC schemes for both the single- and parallel-channel scenarios while maintaining full compatibility with practical digital systems.


Mixing Time of the Proximal Sampler in Relative Fisher Information via Strong Data Processing Inequality

Wibisono, Andre

arXiv.org Artificial Intelligence

We study the mixing time guarantee for sampling in relative Fisher information via the Proximal Sampler algorithm, which is an approximate proximal discretization of the Langevin dynamics. We show that when the target probability distribution is strongly log-concave, the relative Fisher information converges exponentially fast along the Proximal Sampler; this matches the exponential convergence rate of the relative Fisher information along the continuous-time Langevin dynamics for strongly log-concave target. When combined with a standard implementation of the Proximal Sampler via rejection sampling, this exponential convergence rate provides a high-accuracy iteration complexity guarantee for the Proximal Sampler in relative Fisher information when the target distribution is strongly log-concave and log-smooth. Our proof proceeds by establishing a strong data processing inequality for relative Fisher information along the Gaussian channel under strong log-concavity, and a data processing inequality along the reverse Gaussian channel for a special distribution. The forward and reverse Gaussian channels compose to form the Proximal Sampler, and these data processing inequalities imply the exponential convergence rate of the relative Fisher information along the Proximal Sampler.


Analyzing the Generalization Capability of SGLD Using Properties of Gaussian Channels

Neural Information Processing Systems

Optimization is a key component for training machine learning models and has a strong impact on their generalization. In this paper, we consider a particular optimization method---the stochastic gradient Langevin dynamics (SGLD) algorithm---and investigate the generalization of models trained by SGLD. We derive a new generalization bound by connecting SGLD with Gaussian channels found in information and communication theory. Our bound can be computed from the training data and incorporates the variance of gradients for quantifying a particular kind of "sharpness" of the loss landscape. We also consider a closely related algorithm with SGLD, namely differentially private SGD (DP-SGD).


Neural Cover Selection for Image Steganography

Chahine, Karl, Kim, Hyeji

arXiv.org Artificial Intelligence

In steganography, selecting an optimal cover image--referred to as cover selection--is pivotal for effective message concealment. Traditional methods have typically employed exhaustive searches to identify images that conform to specific perceptual or complexity metrics. However, the relationship between these metrics and the actual message hiding efficacy of an image is unclear, often yielding less-than-ideal steganographic outcomes. Inspired by recent advancements in generative models, we introduce a novel cover selection framework, which involves optimizing within the latent space of pretrained generative models to identify the most suitable cover images, distinguishing itself from traditional exhaustive search methods. Our method shows significant advantages in message recovery and image quality. We also conduct an information-theoretic analysis of the generated cover images, revealing that message hiding predominantly occurs in low-variance pixels, reflecting the waterfilling algorithm's principles in parallel Gaussian channels.


Optimality of Approximate Message Passing Algorithms for Spiked Matrix Models with Rotationally Invariant Noise

Dudeja, Rishabh, Liu, Songbin, Ma, Junjie

arXiv.org Machine Learning

We study the problem of estimating a rank one signal matrix from an observed matrix generated by corrupting the signal with additive rotationally invariant noise. We develop a new class of approximate message-passing algorithms for this problem and provide a simple and concise characterization of their dynamics in the high-dimensional limit. At each iteration, these algorithms exploit prior knowledge about the noise structure by applying a non-linear matrix denoiser to the eigenvalues of the observed matrix and prior information regarding the signal structure by applying a non-linear iterate denoiser to the previous iterates generated by the algorithm. We exploit our result on the dynamics of these algorithms to derive the optimal choices for the matrix and iterate denoisers. We show that the resulting algorithm achieves the smallest possible asymptotic estimation error among a broad class of iterative algorithms under a fixed iteration budget.


Coding for the Gaussian Channel in the Finite Blocklength Regime Using a CNN-Autoencoder

Hesham, Nourhan, Bouzid, Mohamed, Abdel-Qader, Ahmad, Chaaban, Anas

arXiv.org Artificial Intelligence

The development of delay-sensitive applications that require ultra high reliability created an additional challenge for wireless networks. This led to Ultra-Reliable Low-Latency Communications, as a use case that 5G and beyond 5G systems must support. However, supporting low latency communications requires the use of short codes, while attaining vanishing frame error probability (FEP) requires long codes. Thus, developing codes for the finite blocklength regime (FBR) achieving certain reliability requirements is necessary. This paper investigates the potential of Convolutional Neural Networks autoencoders (CNN-AE) in approaching the theoretical maximum achievable rate over a Gaussian channel for a range of signal-to-noise ratios at a fixed blocklength and target FEP, which is a different perspective compared to existing works that explore the use of CNNs from bit-error and symbol-error rate perspectives. We explain the studied CNN-AE architecture, evaluate it numerically, and compare it to the theoretical maximum achievable rate and the achievable rates of polar coded quadrature amplitude modulation (QAM), Reed-Muller coded QAM, multilevel polar coded modulation, and a TurboAE-MOD scheme from the literature. Numerical results show that the CNN-AE outperforms these benchmark schemes and approaches the theoretical maximum rate, demonstrating the capability of CNN-AEs in learning good codes for delay-constrained applications.


Federated Learning in MIMO Satellite Broadcast System

Pinard, Raphael, Hassani, Mitra, Lemieux, Wayne

arXiv.org Artificial Intelligence

Federated learning (FL) is a type of distributed machine learning at the wireless edge that preserves the privacy of clients' data from adversaries and even the central server. Existing federated learning approaches either use (i) secure multiparty computation (SMC) which is vulnerable to inference or (ii) differential privacy which may decrease the test accuracy given a large number of parties with relatively small amounts of data each. To tackle the problem with the existing methods in the literature, In this paper, we introduce incorporate federated learning in the inner-working of MIMO systems.