Goto

Collaborating Authors

 dac


Appendix: AmortizedMixingCouplingProcessesfor Clustering

Neural Information Processing Systems

M-step mainly consists of the weighted sum of partial derivative. Thus the complexity of regular E-step takes O(Nkd), and O(Nk(L+d)) for OT-based E step where L is the number of iterations of Sinkhorn's algorithm. The complexity of M-step takes O(Nkd).



Dense and Aligned Captions (DAC) Promote Compositional Reasoning in VL Models

Neural Information Processing Systems

Vision and Language (VL) models offer an effective method for aligning representation spaces of images and text allowing for numerous applications such as cross-modal retrieval, visual and multi-hop question answering, captioning, and many more. However, the aligned image-text spaces learned by all the popular VL models are still suffering from the so-called'object bias' - their representations behave as'bags of nouns' mostly ignoring or downsizing the attributes, relations, and states of objects described/appearing in texts/images. Although some great attempts at fixing these `compositional reasoning' issues were proposed in the recent literature, the problem is still far from being solved. In this paper, we uncover two factors limiting the VL models' compositional reasoning performance. These two factors are properties of the paired VL dataset used for finetuning (or pre-training) the VL model: (i) the caption quality, or in other words'image-alignment', of the texts; and (ii) the'density' of the captions in the sense of mentioning all the details appearing on the image. We propose a fine-tuning approach for automatically treating these factors on a standard collection of paired VL data (CC3M). Applied to CLIP, we demonstrate its significant compositional reasoning performance increase of up to $\sim27$\% over the base model, up to $\sim20$\% over the strongest baseline, and by $6.7$\% on average. Our code is provided in the Supplementary and would be released upon acceptance.


DAC: The Double Actor-Critic Architecture for Learning Options

Neural Information Processing Systems

Under this novel formulation, all policy optimization algorithms can be used off the shelf to learn intra-option policies, option termination conditions, and a master policy over options. We apply an actor-critic algorithm on each augmented MDP, yielding the Double Actor-Critic (DAC) architecture. Furthermore, we show that, when state-value functions are used as critics, one critic can be expressed in terms of the other, and hence only one critic is necessary. We conduct an empirical study on challenging robot simulation tasks. In a transfer learning setting, DAC outperforms both its hierarchy-free counterpart and previous gradient-based option learning algorithms.


Divide and Contrast: Source-free Domain Adaptation via Adaptive Contrastive Learning

Neural Information Processing Systems

We investigate a practical domain adaptation task, called source-free domain adaptation (SFUDA), where the source pretrained model is adapted to the target domain without access to the source data. Existing techniques mainly leverage self-supervised pseudo-labeling to achieve class-wise global alignment [1] or rely on local structure extraction that encourages the feature consistency among neighborhoods [2]. While impressive progress has been made, both lines of methods have their own drawbacks - the "global" approach is sensitive to noisy labels while the "local" counterpart suffers from the source bias. In this paper, we present Divide and Contrast (DaC), a new paradigm for SFUDA that strives to connect the good ends of both worlds while bypassing their limitations. Based on the prediction confidence of the source model, DaC divides the target data into source-like and target-specific samples, where either group of samples is treated with tailored goals under an adaptive contrastive learning framework. Specifically, the source-like samples are utilized for learning global class clustering thanks to their relatively clean labels. The more noisy target-specific data are harnessed at the instance level for learning the intrinsic local structures. We further align the source-like domain with the target-specific samples using a memory bank-based Maximum Mean Discrepancy (MMD) loss to reduce the distribution mismatch. Extensive experiments on VisDA, Office-Home, and the more challenging DomainNet have verified the superior performance of DaC over current state-of-the-art approaches.


ADNAC: Audio Denoiser using Neural Audio Codec

arXiv.org Artificial Intelligence

--Audio denoising is critical in signal processing, enhancing intelligibility and fidelity for applications like restoring musical recordings. This paper presents a proof -of-concept for adapting a state -of -the -art neural audio codec, the Descript Audio Codec (DAC), for music denoising. This work overcomes the limitations of traditional architectures like U - Nets by training the model on a large-scale, custom -synthesized dataset built from diverse sources. Training is guided by a multi-objective loss function that combines time-domain, spectral, and signal -level fidelity metrics. Ultimately, this paper aims to present a PoC for high -fidelity, generative audio restoration. Noise reduction is a fundamental part of audio signal processing, substantially improving signal quality and intelligibility across domains like speech processing [1-3], music production and restoration [1], and bioacoustics analysis [2].


Topology-Aware Active Learning on Graphs

arXiv.org Artificial Intelligence

We propose a graph-topological approach to active learning that directly targets the core challenge of exploration versus exploitation under scarce label budgets. To guide exploration, we introduce a coreset construction algorithm based on Balanced Forman Curvature (BFC), which selects representative initial labels that reflect the graph's cluster structure. This method includes a data-driven stopping criterion that signals when the graph has been sufficiently explored. We further use BFC to dynamically trigger the shift from exploration to exploitation within active learning routines, replacing hand-tuned heuristics. To improve exploitation, we introduce a localized graph rewiring strategy that efficiently incorporates multiscale information around labeled nodes, enhancing label propagation while preserving sparsity. Experiments on benchmark classification tasks show that our methods consistently outperform existing graph-based semi-supervised baselines at low label rates.


Fully Decentralized Cooperative Multi-Agent Reinforcement Learning is A Context Modeling Problem

arXiv.org Artificial Intelligence

This paper studies fully decentralized cooperative multi-agent reinforcement learning, where each agent solely observes the states, its local actions, and the shared rewards. The inability to access other agents' actions often leads to non-stationarity during value function updates and relative overgeneralization during value function estimation, hindering effective cooperative policy learning. However, existing works fail to address both issues simultaneously, due to their inability to model the joint policy of other agents in a fully decentralized setting. To overcome this limitation, we propose a novel method named Dynamics-A ware Context (DAC), which formalizes the task, as locally perceived by each agent, as an Contextual Markov Decision Process, and further addresses both non-stationarity and relative overgeneralization through dynamics-aware context modeling. Specifically, DAC attributes the non-stationary local task dynamics of each agent to switches between unobserved contexts, each corresponding to a distinct joint policy. Then, DAC models the step-wise dynamics distribution using latent variables and refers to them as contexts. For each agent, DAC introduces a context-based value function to address the non-stationarity issue during value function update. For value function estimation, an optimistic marginal value is derived to promote the selection of cooperative actions, thereby addressing the relative overgeneralization issue. Experimentally, we evaluate DAC on various cooperative tasks (including matrix game, predator and prey, and SMAC), and its superior performance against multiple baselines validates its effectiveness.



Learning to Quantize and Precode in Massive MIMO Systems for Energy Reduction: a Graph Neural Network Approach

arXiv.org Machine Learning

Massive MIMO systems are moving toward increased numbers of radio frequency chains, higher carrier frequencies and larger bandwidths. As such, digital-to-analog converters (DACs) are becoming a bottleneck in terms of hardware complexity and power consumption. In this work, non-linear precoding for coarsely quantized downlink massive MIMO is studied. Given the NP-hard nature of this problem, a graph neural network (GNN) is proposed that directly outputs the precoded quantized vector based on the channel matrix and the intended transmit symbols. The model is trained in a self-supervised manner, by directly maximizing the achievable rate. To overcome the non-differentiability of the objective function, introduced due to the non-differentiable DAC functions, a straight-through Gumbel-softmax estimation of the gradient is proposed. The proposed method achieves a significant increase in achievable sum rate under coarse quantization. For instance, in the single-user case, the proposed method can achieve the same sum rate as maximum ratio transmission (MRT) by using one-bit DAC's as compared to 3 bits for MRT. This reduces the DAC's power consumption by a factor 4-7 and 3 for baseband and RF DACs respectively. This, however, comes at the cost of increased digital signal processing power consumption. When accounting for this, the reduction in overall power consumption holds for a system bandwidth up to 3.5 MHz for baseband DACs, while the RF DACs can maintain a power reduction of 2.9 for higher bandwidths. Notably, indirect effects, which further reduce the power consumption, such as a reduced fronthaul consumption and reduction in other components, are not considered in this analysis.