subband
1c364d98a5cdc426fd8c76fbb2c10e34-Supplemental-Conference.pdf
The way to instantiate BACON will be similar to MFN. The following Lemma will showthat Definition 1.2 can be extended toanalyzing functions from differentdomain. Let F = gL g1 γ, with gi being a multivariate polynomial. The inductive hypothesis is: fork 1, if zk[j] is linear sum ofB for all j, then zk+1[l]islinearsumsofB foralll. By definition ofz, we know thatzk+1 = gk(zk), where gk is a multivariate polynomial of finite degreed.
WaveTuner: Comprehensive Wavelet Subband Tuning for Time Series Forecasting
Wang, Yubo, He, Hui, Niu, Chaoxi, Niu, Zhendong
Due to the inherent complexity, temporal patterns in real-world time series often evolve across multiple intertwined scales, including long-term periodicity, short-term fluctuations, and abrupt regime shifts. While existing literature has designed many sophisticated decomposition approaches based on the time or frequency domain to partition trend-seasonality components and high-low frequency components, an alternative line of approaches based on the wavelet domain has been proposed to provide a unified multi-resolution representation with precise time-frequency localization. However, most wavelet-based methods suffer from a persistent bias toward recursively decomposing only low-frequency components, severely underutilizing subtle yet informative high-frequency components that are pivotal for precise time series forecasting. To address this problem, we propose WaveTuner, a W avelet decomposition framework empowered by full-spectrum subband Tuning for time series forecasting. Concretely, WaveTuner comprises two key modules: (i) Adaptive W avelet Refinement module, that transforms time series into time-frequency coefficients, utilizes an adaptive router to dynamically assign subband weights, and generates subband-specific embeddings to support refinement; and (ii) Multi-Branch Specialization module, that employs multiple functional branches, each instantiated as a flexible Kolmogorov-Arnold Network (KAN) with a distinct functional order to model a specific spectral subband. Equipped with these modules, WaveTuner comprehensively tunes global trends and local variations within a unified time-frequency framework. Extensive experiments on eight real-world datasets demonstrate WaveTuner achieves state-of-the-art forecasting performance in time series forecasting.
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
- Europe > Denmark > Capital Region > Copenhagen (0.04)
- Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
DiMSUM: Diffusion Mamba -- A Scalable and Unified Spatial-Frequency Method for Image Generation
Phung, Hao, Dao, Quan, Dao, Trung, Phan, Hoang, Metaxas, Dimitris, Tran, Anh
We introduce a novel state-space architecture for diffusion models, effectively harnessing spatial and frequency information to enhance the inductive bias towards local features in input images for image generation tasks. While state-space networks, including Mamba, a revolutionary advancement in recurrent neural networks, typically scan input sequences from left to right, they face difficulties in designing effective scanning strategies, especially in the processing of image data. Our method demonstrates that integrating wavelet transformation into Mamba enhances the local structure awareness of visual inputs and better captures long-range relations of frequencies by disentangling them into wavelet subbands, representing both low- and high-frequency components. These wavelet-based outputs are then processed and seamlessly fused with the original Mamba outputs through a cross-attention fusion layer, combining both spatial and frequency information to optimize the order awareness of state-space models which is essential for the details and overall quality of image generation. Besides, we introduce a globally-shared transformer to supercharge the performance of Mamba, harnessing its exceptional power to capture global relationships. Through extensive experiments on standard benchmarks, our method demonstrates superior results compared to DiT and DIFFUSSM, achieving faster training convergence and delivering high-quality outputs. The codes and pretrained models are released at https://github.com/VinAIResearch/DiMSUM.git.
- Information Technology > Sensing and Signal Processing > Image Processing (1.00)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
LG-Sleep: Local and Global Temporal Dependencies for Mice Sleep Scoring
Sartipi, Shadi, Andersen, Mie, Hauglund, Natalie, Kjaerby, Celia, Untiet, Verena, Nedergaard, Maiken, Cetin, Mujdat
Efficiently identifying sleep stages is crucial for unraveling the intricacies of sleep in both preclinical and clinical research. The labor-intensive nature of manual sleep scoring, demanding substantial expertise, has prompted a surge of interest in automated alternatives. Sleep studies in mice play a significant role in understanding sleep patterns and disorders and underscore the need for robust scoring methodologies. In response, this study introduces LG-Sleep, a novel subject-independent deep neural network architecture designed for mice sleep scoring through electroencephalogram (EEG) signals. LG-Sleep extracts local and global temporal transitions within EEG signals to categorize sleep data into three stages: wake, rapid eye movement (REM) sleep, and non-rapid eye movement (NREM) sleep. The model leverages local and global temporal information by employing time-distributed convolutional neural networks to discern local temporal transitions in EEG data. Subsequently, features derived from the convolutional filters traverse long short-term memory blocks, capturing global transitions over extended periods. Crucially, the model is optimized in an autoencoder-decoder fashion, facilitating generalization across distinct subjects and adapting to limited training samples. Experimental findings demonstrate superior performance of LG-Sleep compared to conventional deep neural networks. Moreover, the model exhibits good performance across different sleep stages even when tasked with scoring based on limited training samples.
- Europe > Denmark > Capital Region > Copenhagen (0.05)
- North America > United States > New York > Monroe County > Rochester (0.04)
Neural Speech and Audio Coding
This paper explores the integration of model-based and data-driven approaches within the realm of neural speech and audio coding systems. It highlights the challenges posed by the subjective evaluation processes of speech and audio codecs and discusses the limitations of purely data-driven approaches, which often require inefficiently large architectures to match the performance of model-based methods. The study presents hybrid systems as a viable solution, offering significant improvements to the performance of conventional codecs through meticulously chosen design enhancements. Specifically, it introduces a neural network-based signal enhancer designed to post-process existing codecs' output, along with the autoencoder-based end-to-end models and LPCNet--hybrid systems that combine linear predictive coding (LPC) with neural networks. Furthermore, the paper delves into predictive models operating within custom feature spaces (TF-Codec) or predefined transform domains (MDCTNet) and examines the use of psychoacoustically calibrated loss functions to train end-to-end neural audio codecs. Through these investigations, the paper demonstrates the potential of hybrid systems to advance the field of speech and audio coding by bridging the gap between traditional model-based approaches and modern data-driven techniques.
Multi-scale Conditional Generative Modeling for Microscopic Image Restoration
Huang, Luzhe, Xiao, Xiongye, Li, Shixuan, Sun, Jiawen, Huang, Yi, Ozcan, Aydogan, Bogdan, Paul
The advance of diffusion-based generative models in recent years has revolutionized state-of-the-art (SOTA) techniques in a wide variety of image analysis and synthesis tasks, whereas their adaptation on image restoration, particularly within computational microscopy remains theoretically and empirically underexplored. In this research, we introduce a multi-scale generative model that enhances conditional image restoration through a novel exploitation of the Brownian Bridge process within wavelet domain. By initiating the Brownian Bridge diffusion process specifically at the lowest-frequency subband and applying generative adversarial networks at subsequent multi-scale high-frequency subbands in the wavelet domain, our method provides significant acceleration during training and sampling while sustaining a high image generation quality and diversity on par with SOTA diffusion models. Experimental results on various computational microscopy and imaging tasks confirm our method's robust performance and its considerable reduction in its sampling steps and time. This pioneering technique offers an efficient image restoration framework that harmonizes efficiency with quality, signifying a major stride in incorporating cutting-edge generative models into computational microscopy workflows.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- Europe > Spain > Balearic Islands > Mallorca > Palma (0.04)
- Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
- Asia > China > Guangdong Province > Shenzhen (0.04)