frequency domain
State Sequences Prediction via Fourier Transform for Representation Learning
While deep reinforcement learning (RL) has been demonstrated effective in solving complex control tasks, sample efficiency remains a key challenge due to the large amounts of data required for remarkable performance. Existing research explores the application of representation learning for data-efficient RL, e.g., learning predictive representations by predicting long-term future states. However, many existing methods do not fully exploit the structural information inherent in sequential state signals, which can potentially improve the quality of long-term decision-making but is difficult to discern in the time domain. To tackle this problem, we propose State Sequences Prediction via Fourier Transform (SPF), a novel method that exploits the frequency domain of state sequences to extract the underlying patterns in time series data for learning expressive representations efficiently. Specifically, we theoretically analyze the existence of structural information in state sequences, which is closely related to policy performance and signal regularity, and then propose to predict the Fourier transform of infinite-step future state sequences to extract such information. One of the appealing features of SPF is that it is simple to implement while not requiring storage of infinite-step future states as prediction targets. Experiments demonstrate that the proposed method outperforms several state-of-the-art algorithms in terms of both sample efficiency and performance.2
Appendix information on the relationship between our training approach and domain adaptation
Here we note our problem definition of pre-training is fundamentally different from domain adaptation [S1, S2, S3, S4, S5, S6]1 in order to prevent any confusion between this work and domain adaptation methods. DA applies a model trained on a pre-training dataset (i.e., source dataset) to a different target dataset [21, 42]. In contrast, self-supervised pre-training has four key differences with domain adaptation. In contrast, domain adaptation methods usually restrict pre-training and target datasets to have the same feature space (but possible different distributions), e.g., [S22, S18, S19, S20, S13]. In summary, to support transfer learning across different time series datasets, a pre-training approach needs a capability to capture a generalizable property of time series, one that is shared across different time series datasets regardless of the specific semantic meaning of a time series signal (e.g., ECG, EMG, acceleration, vibration), conditions of data acquisition (e.g., variation across subjects and devices), sampling frequencies, etc. This work develops a self-supervised contrastive pre-training strategy that fulfills these requirements by injecting an appropriate inductive bias (called Time-Frequency Consistency, TF-C, into the model (Sec. Further, we clarify that the term'self-supervised' has different meanings in DA and in pretraining [S23, S24, S25, S26]. The'self-supervised domain adaptation' [S27, S16, S21, S15] or'unsupervised domain adaptation' [S1, S22, S28, S11, S14] means that there are no labels in the target dataset, however that still requires labels in the pre-training dataset. In contrast, 'self-supervised pretraining' [S29, S30, S31] (i.e., the problem studied here, in line with a breadth of existing literature on pre-training) indicates the setting where no labels are available in pre-training. Up to the submission of this manuscript, there is no existing contrastive augmentations in time series' frequency domain. There are two models, CoST [49] and BTSF [50], that involved frequency domain in contrastive learning, however, the proposed TF-C is fundamentally different with them in the following aspects. We take BTSF as an example while the differences also apply to CoST. Problem definitions for both papers are different. Our method is designed to produce generalizable representations that can transfer to a different time series dataset (going from pre-training to a fine-tuning dataset) for the purpose of transfer learning.
Rethinking and Improving Robustness of Convolutional Neural Networks: a Shapley Value-based Approach in Frequency Domain
The existence of adversarial examples poses concerns for the robustness of convolutional neural networks (CNN), for which a popular hypothesis is about the frequency bias phenomenon: CNNs rely more on high-frequency components (HFC) for classification than humans, which causes the brittleness of CNNs. However, most previous works manually select and roughly divide the image frequency spectrum and conduct qualitative analysis. In this work, we introduce Shapley value, a metric of cooperative game theory, into the frequency domain and propose to quantify the positive (negative) impact of every frequency component of data on CNNs. Based on the Shapley value, we quantify the impact in a fine-grained way and show intriguing instance disparity. Statistically, we investigate adversarial training(AT) and the adversarial attack in the frequency domain. The observations motivate us to perform an in-depth analysis and lead to multiple novel hypotheses about i) the cause of adversarial robustness of the AT model; ii) the fairness problem of AT between different classes in the same dataset; iii) the attack bias on different frequency components. Finally, we propose a Shapley-value guided data augmentation technique for improving the robustness. Experimental results on image classification benchmarks show its effectiveness. The code for this paper is at https://github.com/Ytchen981/CSA
CNNpack: Packing Convolutional Neural Networks in the Frequency Domain
Yunhe Wang, Chang Xu, Shan You, Dacheng Tao, Chao Xu
Deep convolutional neural networks (CNNs) are successfully used in a number of applications. However, their storage and computational requirements have largely prevented their widespread use on mobile devices. Here we present an effective CNN compression approach in the frequency domain, which focuses not only on smaller weights but on all the weights and their underlying connections. By treating convolutional filters as images, we decompose their representations in the frequency domain as common parts (i.e., cluster centers) shared by other similar filters and their individual private parts (i.e., individual residuals). A large number of low-energy frequency coefficients in both parts can be discarded to produce high compression without significantly compromising accuracy. We relax the computational burden of convolution operations in CNNs by linearly combining the convolution responses of discrete cosine transform (DCT) bases. The compression and speed-up ratios of the proposed algorithm are thoroughly analyzed and evaluated on benchmark image datasets to demonstrate its superiority over state-of-the-art methods.
DOPPLER: Differentially Private Optimizers with Low-pass Filter for Privacy Noise Reduction
Privacy is a growing concern in modern deep-learning systems and applications. Differentially private (DP) training prevents the leakage of sensitive information in the collected training data from the trained machine learning models. DP optimizers, including DP stochastic gradient descent (DPSGD) and its variants, privatize the training procedure by gradient clipping and injection. However, in practice, DP models trained using DPSGD and its variants often suffer from significant model performance degradation. Such degradation prevents the application of DP optimization in many key tasks, such as foundation model pretraining.
CNNpack: Packing Convolutional Neural Networks in the Frequency Domain
Deep convolutional neural networks (CNNs) are successfully used in a number of applications. However, their storage and computational requirements have largely prevented their widespread use on mobile devices. Here we present an effective CNN compression approach in the frequency domain, which focuses not only on smaller weights but on all the weights and their underlying connections. By treating convolutional filters as images, we decompose their representations in the frequency domain as common parts (i.e., cluster centers) shared by other similar filters and their individual private parts (i.e., individual residuals). A large number of low-energy frequency coefficients in both parts can be discarded to produce high compression without significantly compromising accuracy. We relax the computational burden of convolution operations in CNNs by linearly combining the convolution responses of discrete cosine transform (DCT) bases. The compression and speed-up ratios of the proposed algorithm are thoroughly analyzed and evaluated on benchmark image datasets to demonstrate its superiority over state-of-the-art methods.