dct coefficient
- North America > Canada (0.04)
- Europe > France (0.04)
Learning Single-Image Super-Resolution in the JPEG Compressed Domain
Srinivasan, Sruthi, Shakibapour, Elham, Rawther, Rajy, Saeedi, Mehdi
Deep learning models have grown increasingly complex, with input data sizes scaling accordingly. Despite substantial advances in specialized deep learning hardware, data loading continues to be a major bottleneck that limits training and inference speed. To address this challenge, we propose training models directly on encoded JPEG features, reducing the computational overhead associated with full JPEG decoding and significantly improving data loading efficiency. While prior works have focused on recognition tasks, we investigate the effectiveness of this approach for the restoration task of single-image super-resolution (SISR). We present a lightweight super-resolution pipeline that operates on JPEG discrete cosine transform (DCT) coefficients in the frequency domain. Our pipeline achieves a 2.6x speedup in data loading and a 2.5x speedup in training, while preserving visual quality comparable to standard SISR approaches.
- North America > Mexico > Gulf of Mexico (0.04)
- Asia (0.04)
- Information Technology > Hardware (0.51)
- Semiconductors & Electronics (0.40)
Sparsity via Hyperpriors: A Theoretical and Algorithmic Study under Empirical Bayes Framework
Li, Zhitao, Dong, Yiqiu, Zeng, Xueying
This paper presents a comprehensive analysis of hyperparameter estimation within the empirical Bayes framework (EBF) for sparse learning. By studying the influence of hyperpriors on the solution of EBF, we establish a theoretical connection between the choice of the hyperprior and the sparsity as well as the local optimality of the resulting solutions. We show that some strictly increasing hyperpriors, such as half-Laplace and half-generalized Gaussian with the power in $(0,1)$, effectively promote sparsity and improve solution stability with respect to measurement noise. Based on this analysis, we adopt a proximal alternating linearized minimization (PALM) algorithm with convergence guaranties for both convex and concave hyperpriors. Extensive numerical tests on two-dimensional image deblurring problems demonstrate that introducing appropriate hyperpriors significantly promotes the sparsity of the solution and enhances restoration accuracy. Furthermore, we illustrate the influence of the noise level and the ill-posedness of inverse problems to EBF solutions.
- Asia > China > Shandong Province > Qingdao (0.04)
- North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
- Europe > Denmark > Capital Region > Kongens Lyngby (0.04)
Efficient Neural Networks with Discrete Cosine Transform Activations
Martinez-Gost, Marc, Pepe, Sara, Pérez-Neira, Ana, Lagunas, Miguel Ángel
In this paper, we extend our previous work on the Expressive Neural Network (ENN), a multilayer perceptron with adaptive activation functions parametrized using the Discrete Cosine Transform (DCT). Building upon previous work that demonstrated the strong expressiveness of ENNs with compact architectures, we now emphasize their efficiency, interpretability and pruning capabilities. The DCT-based parameterization provides a structured and decorrelated representation that reveals the functional role of each neuron and allows direct identification of redundant components. Leveraging this property, we propose an efficient pruning strategy that removes unnecessary DCT coefficients with negligible or no loss in performance. Experimental results across classification and implicit neural representation tasks confirm that ENNs achieve state-of-the-art accuracy while maintaining a low number of parameters. Furthermore, up to 40% of the activation coefficients can be safely pruned, thanks to the orthogonality and bounded nature of the DCT basis. Overall, these findings demonstrate that the ENN framework offers a principled integration of signal processing concepts into neural network design, achieving a balanced trade-off between expressiveness, compactness, and interpretability.
- Europe > Spain (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
Video Quality Enhancement Using Deep Learning-Based Prediction Models for Quantized DCT Coefficients in MPEG I-frames
Busson, Antonio J G, Mendes, Paulo R C, Moraes, Daniel de S, da Veiga, Álvaro M, Guedes, Álan L V, Colcher, Sérgio
--Recent works have successfully applied some types of Convolutional Neural Networks (CNNs) to reduce the noticeable distortion resulting from the lossy JPEG/MPEG compression technique. Most of them are built upon the processing made on the spatial domain. In this work, we propose a MPEG video decoder that is purely based on the frequency-to-frequency domain: it reads the quantized DCT coefficients received from a low-quality I-frames bitstream and, using a deep learning-based model, predicts the missing coefficients in order to recompose the same frames with enhanced quality. In experiments with a video dataset, our best model was able to improve from frames with quantized DCT coefficients corresponding to a Quality Factor (QF) of 10 to enhanced quality frames with QF slightly near to 20. The application of methods based on Deep Learning (DL) in multimedia systems has opened a range of cognitive features in many directions that go beyond the traditional functionalities of capturing, streaming and presenting information. It has provided a whole new extent of capabilities that includes detection and classification of objects. New platforms and development techniques were tailored, and entirely new frameworks were brought together to enhance the development of such systems [1] trying to fill in the gap between this vast (and relatively new) technological knowledge and the practical development of modern systems.
- Europe > Portugal (0.04)
- South America > Brazil > Rio de Janeiro > Rio de Janeiro (0.04)
- North America > United States (0.04)
RDD: Pareto Analysis of the Rate-Distortion-Distinguishability Trade-off
Enttsel, Andriy, Marchioni, Alex, Zanellini, Andrea, Mangia, Mauro, Setti, Gianluca, Rovatti, Riccardo
Extensive monitoring systems generate data that is usually compressed for network transmission. This compressed data might then be processed in the cloud for tasks such as anomaly detection. However, compression can potentially impair the detector's ability to distinguish between regular and irregular patterns due to information loss. Here we extend the information-theoretic framework introduced in [1] to simultaneously address the trade-off between the three features on which the effectiveness of the system depends: the effectiveness of compression, the amount of distortion it introduces, and the distinguishability between compressed normal signals and compressed anomalous signals. We leverage a Gaussian assumption to draw curves showing how moving on a Pareto surface helps administer such a trade-off better than simply relying on optimal rate-distortion compression and hoping that compressed signals can be distinguished from each other.
- North America > Canada > Ontario > Toronto (0.14)
- Europe > Italy > Emilia-Romagna > Metropolitan City of Bologna > Bologna (0.05)
- North America > United States (0.04)
- Asia > Middle East > Saudi Arabia (0.04)
- Research Report (0.50)
- Overview (0.46)
Reviews: Faster Neural Networks Straight from JPEG
My main concerns were responded to in the the rebuttal (separation between IO/CPU/GPU gains), TFLOP measurements and discussion of related works. I was happy to see that their model (Late-Concat-RFA-Thinner) is still faster than ResNet 50 (approx 680/475 43% gains in Figure 1. This is a pessimistic estimate given that the ResNet 50 RGB needs to also do the inverse DCT to go from DCT coefficients to RGB domain. However, I was a bit surprised to see such a big disconnect between the timing numbers and the TFLOP measurements (Figure 1. b vs Fig 1. c rebuttal). While I trust that the authors timed the models fairly and thus I do not doubt the results, I think this would be worth more investigation. For the related works, the authors did a good discussion of them in the rebuttal, but I find it strange that we had to ask for this.
Compressing Sign Information in DCT-based Image Coding via Deep Sign Retrieval
Suzuki, Kei, Tsutake, Chihiro, Takahashi, Keita, Fujii, Toshiaki
The discrete cosine transformation (DCT) [1] is known as an important technique for image coding and is adopted in various image coding standards [2, 3, 4, 5, 6, 7, 8, 9]. For instance, JPEG [2] first divides an original image into non-overlapping blocks and then applies DCT to each of the blocks followed by quantization. Entropy coding is finally performed to obtain bit representations for the quantized DCT coefficients. According to the source coding theory [10], statistically biased symbols can be efficiently compressed using entropy coding methods such as [11, 12, 13, 14]. However, the sign information of DCT coefficients has equiprobable characteristics [15, 16, 17], i.e., the probabilities of the positive and negative signs are almost even, and the compression of the sign information has been thus considered impossible. Therefore, each of the signs is represented using 1 bit in typical image coding methods; the sign information consumes many bits in the resulting bitstream. To reduce the bit amount for the signs, we address a sign compression problem for DCT coefficients in this paper. In particular, we consider a lossless sign compression problem, where the signs of the DCT coefficients are decoded without loss. We briefly summarize seminal works developed to tackle this challenging problem.
- North America > United States (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Asia > Japan (0.04)