dynamic factor
Improving Target Sound Extraction via Disentangled Codec Representations with Privileged Knowledge Distillation
Target sound extraction aims to isolate target sound sources from an input mixture using a target clue to identify the sounds of interest. To address the challenge posed by the wide variety of sounds, recent work has introduced privileged knowledge distillation (PKD), which utilizes privileged information (PI) about the target sound, available only during training. While PKD has shown promise, existing approaches often suffer from overfitting of the teacher model for the overly rich PI and ineffective knowledge transfer to the student model. In this paper, we propose Disentangled Codec Knowledge Distillation (DCKD) to mitigate these issues by regulating the amount and the flow of target sound information within the teacher model. We begin by extracting a compressed representation of the target sound using a neural audio codec to regulate the amount of PI. Disentangled representation learning is then applied to remove class information and extract fine-grained temporal information as PI. Subsequently, an n-hot vector as the class information and the class-independent PI are used to condition the early and later layers of the teacher model, respectively, forming a regulated coarse-to-fine target information flow. The resulting representation is transferred to the student model through feature-level knowledge distillation. Experimental results show that DCKD consistently improves existing methods across model architectures under the multi-target selection condition.
Disentanglement Beyond Static vs. Dynamic: A Benchmark and Evaluation Framework for Multi-Factor Sequential Representations
Barami, Tal, Berman, Nimrod, Naiman, Ilan, Hason, Amos H., Ezra, Rotem, Azencot, Omri
Learning disentangled representations in sequential data is a key goal in deep learning, with broad applications in vision, audio, and time series. While real-world data involves multiple interacting semantic factors over time, prior work has mostly focused on simpler two-factor static and dynamic settings, primarily because such settings make data collection easier, thereby overlooking the inherently multi-factor nature of real-world data. We introduce the first standardized benchmark for evaluating multi-factor sequential disentanglement across six diverse datasets spanning video, audio, and time series. Our benchmark includes modular tools for dataset integration, model development, and evaluation metrics tailored to multi-factor analysis. We additionally propose a post-hoc Latent Exploration Stage to automatically align latent dimensions with semantic factors, and introduce a Koopman-inspired model that achieves state-of-the-art results. Moreover, we show that Vision-Language Models can automate dataset annotation and serve as zero-shot disentanglement evaluators, removing the need for manual labels and human intervention. Together, these contributions provide a robust and scalable foundation for advancing multi-factor sequential disentanglement. Our code is available on GitHub, and the datasets and trained models are available on Hugging Face.
DiffSDA: Unsupervised Diffusion Sequential Disentanglement Across Modalities
Zisling, Hedi, Naiman, Ilan, Berman, Nimrod, Suwajanakorn, Supasorn, Azencot, Omri
Unsupervised representation learning, particularly sequential disentanglement, aims to separate static and dynamic factors of variation in data without relying on labels. This remains a challenging problem, as existing approaches based on variational autoencoders and generative adversarial networks often rely on multiple loss terms, complicating the optimization process. Furthermore, sequential disentanglement methods face challenges when applied to real-world data, and there is currently no established evaluation protocol for assessing their performance in such settings. Recently, diffusion models have emerged as state-of-the-art generative models, but no theoretical formalization exists for their application to sequential disentanglement. In this work, we introduce the Diffusion Sequential Disentanglement Autoencoder (DiffSDA), a novel, modal-agnostic framework effective across diverse real-world data modalities, including time series, video, and audio. DiffSDA leverages a new probabilistic modeling, latent diffusion, and efficient samplers, while incorporating a challenging evaluation protocol for rigorous testing. Our experiments on diverse real-world benchmarks demonstrate that DiffSDA outperforms recent state-of-the-art methods in sequential disentanglement.
Wind Power Scenario Generation based on the Generalized Dynamic Factor Model and Generative Adversarial Network
Cho, Young-ho, Zhu, Hao, Lee, Duehee, Baldick, Ross
--For conducting resource adequacy studies, we synthesize multiple long-term wind power scenarios of distributed wind farms simultaneously by using the spatio-temporal features: spatial and temporal correlation, waveforms, marginal and ramp rates distributions of waveform, power spectral densities, and statistical characteristics. Generating the spatial correlation in scenarios requires the design of common factors for neighboring wind farms and antithetical factors for distant wind farms. The generalized dynamic factor model (GDFM) can extract the common factors through cross spectral density analysis, but it cannot closely imitate waveforms. The GAN can synthesize plausible samples representing the temporal correlation by verifying samples through a fake sample discriminator . T o combine the advantages of GDFM and GAN, we use the GAN to provide a filter that extracts dynamic factors with temporal information from the observation data, and we then apply this filter in the GDFM to represent both spatial and frequency correlations of plausible waveforms. Numerical tests on the combination of GDFM and GAN have demonstrated performance improvements over competing alternatives in synthesizing wind power scenarios from Australia, better realizing plausible statistical characteristics of actual wind power compared to alternatives such as the GDFM with a filter synthesized from distributions of actual dynamic filters and the GAN with direct synthesis without dynamic factors. ESOURCE adequacy means to maintain power system reliability by having sufficient capacity such that, even with failures or variability of resources, the probability of not being able to meet all load is sufficiently small [1]. System operators achieve resource adequacy of a power system by ensuring there is enough generation capacity [2]. In the case of intermittent energy resources, the effective load carrying capacity (ELCC) of the intermittent resource is the equivalent capacity of highly reliable generators that would result in the same probability of not being able to meet all load [3]. For example, the ELCC of wind power can be obtained by simulating power systems with long-term wind power scenarios with realistic ramping rates and marginal distributions [4]. Furthermore, the capacity factor and reserve margin contribution of wind power to the power system reliability can also be obtained by simulating a future power system by using realistic long-term wind power scenarios [5].
FG-PE: Factor-graph Approach for Multi-robot Pursuit-Evasion
Esfahani, Messiah Abolfazli, Başar, Ayşe, Saeedi, Sajad
With the increasing use of robots in daily life, there is a growing need to provide robust collaboration protocols for robots to tackle more complicated and dynamic problems effectively. This paper presents a novel, factor graph-based approach to address the pursuit-evasion problem, enabling accurate estimation, planning, and tracking of an evader by multiple pursuers working together. It is assumed that there are multiple pursuers and only one evader in this scenario. The proposed method significantly improves the accuracy of evader estimation and tracking, allowing pursuers to capture the evader in the shortest possible time and distance compared to existing techniques. In addition to these primary objectives, the proposed approach effectively minimizes uncertainty while remaining robust, even when communication issues lead to some messages being dropped or lost. Through a series of comprehensive experiments, this paper demonstrates that the proposed algorithm consistently outperforms traditional pursuit-evasion methods across several key performance metrics, such as the time required to capture the evader and the average distance traveled by the pursuers. Additionally, the proposed method is tested in real-world hardware experiments, further validating its effectiveness and applicability.