separation result
- North America > United States > Virginia (0.04)
- Asia > Afghanistan > Parwan Province > Charikar (0.04)
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation (Supplementary Materials)
Figure 1 shows a diagram of the training scheme for the cross-modal retrieval module. Each multiple choice consists of the correct vision+audio fusion embedding along with a pose embedding. Experimental results if one of the modality is erased. Type of Masking SDR () SIR () SAR () Masking is used for visual modality 7.82 14.39 10.65 Masking is used for pose modality 12.06 18.34 14.17 15% random masking for both visual and pose modality 12.34 18.76 14.37 In this paper, we are using sound separation as our primary task. Therefore, we do not consider masking for the audio modality.
- North America > Canada > Ontario > Toronto (0.15)
- North America > Canada > British Columbia (0.05)
Dependence-Aware Label Aggregation for LLM-as-a-Judge via Ising Models
Balasubramanian, Krishnakumar, Podkopaev, Aleksandr, Kasiviswanathan, Shiva Prasad
Large-scale AI evaluation increasingly relies on aggregating binary judgments from $K$ annotators, including LLMs used as judges. Most classical methods, e.g., Dawid-Skene or (weighted) majority voting, assume annotators are conditionally independent given the true label $Y\in\{0,1\}$, an assumption often violated by LLM judges due to shared data, architectures, prompts, and failure modes. Ignoring such dependencies can yield miscalibrated posteriors and even confidently incorrect predictions. We study label aggregation through a hierarchy of dependence-aware models based on Ising graphical models and latent factors. For class-dependent Ising models, the Bayes log-odds is generally quadratic in votes; for class-independent couplings, it reduces to a linear weighted vote with correlation-adjusted parameters. We present finite-$K$ examples showing that methods based on conditional independence can flip the Bayes label despite matching per-annotator marginals. We prove separation results demonstrating that these methods remain strictly suboptimal as the number of judges grows, incurring nonvanishing excess risk under latent factors. Finally, we evaluate the proposed method on three real-world datasets, demonstrating improved performance over the classical baselines.
- Asia > Middle East > Jordan (0.04)
- North America > United States > California > Yolo County > Davis (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
- (2 more...)
Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics
Several works in implicit and explicit generative modeling empirically observed that feature-learning discriminators outperform fixed-kernel discriminators in terms of the sample quality of the models. We provide separation results between probability metrics with fixed-kernel and feature-learning discriminators using the function classes $\mathcal{F}_2$ and $\mathcal{F}_1$ respectively, which were developed to study overparametrized two-layer neural networks. In particular, we construct pairs of distributions over hyper-spheres that can not be discriminated by fixed kernel $(\mathcal{F}_2)$ integral probability metric (IPM) and Stein discrepancy (SD) in high dimensions, but that can be discriminated by their feature learning ($\mathcal{F}_1$) counterparts. To further study the separation we provide links between the $\mathcal{F}_1$ and $\mathcal{F}_2$ IPMs with sliced Wasserstein distances. Our work suggests that fixed-kernel discriminators perform worse than their feature learning counterparts because their corresponding metrics are weaker.
A Separation Result Between Data-oblivious and Data-aware Poisoning Attacks
Poisoning attacks have emerged as a significant security threat to machine learning algorithms. It has been demonstrated that adversaries who make small changes to the training set, such as adding specially crafted data points, can hurt the performance of the output model. Most of these attacks require the full knowledge of training data. This leaves open the possibility of achieving the same attack results using poisoning attacks that do not have the full knowledge of the clean training set.In this work, we initiate a theoretical study of the problem above. Specifically, for the case of feature selection with LASSO, we show that \emph{full information} adversaries (that craft poisoning examples based on the rest of the training data) are provably much more devastating compared to the optimal attacker that is \emph{oblivious} to the training set yet has access to the distribution of the data. Our separation result shows that the two settings of data-aware and data-oblivious are fundamentally different and we cannot hope to achieve the same attack or defense results in these scenarios.
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation (Supplementary Materials)
Figure 1 shows a diagram of the training scheme for the cross-modal retrieval module. Each multiple choice consists of the correct vision+audio fusion embedding along with a pose embedding. Experimental results if one of the modality is erased. Type of Masking SDR () SIR () SAR () Masking is used for visual modality 7.82 14.39 10.65 Masking is used for pose modality 12.06 18.34 14.17 15% random masking for both visual and pose modality 12.34 18.76 14.37 In this paper, we are using sound separation as our primary task. Therefore, we do not consider masking for the audio modality.
- North America > Canada > Ontario > Toronto (0.15)
- North America > Canada > British Columbia (0.05)
A Unified View of Optimal Kernel Hypothesis Testing
This paper provides a unifying view of optimal kernel hypothesis testing across the MMD two-sample, HSIC independence, and KSD goodness-of-fit frameworks. Minimax optimal separation rates in the kernel and $L^2$ metrics are presented, with two adaptive kernel selection methods (kernel pooling and aggregation), and under various testing constraints: computational efficiency, differential privacy, and robustness to data corruption. Intuition behind the derivation of the power results is provided in a unified way accross the three frameworks, and open problems are highlighted.
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > Colorado > Boulder County > Boulder (0.04)
- Asia > Middle East > Jordan (0.04)
Reviews: Supervised learning through the lens of compression
Most of the results established in the paper would, in the special case of binary classification, trivially follow from the known upper and lower bounds on sample complexity based on the VC dimension. However, the results were not previously known for multiclass learning, and other general loss functions. The results for the 0-1 loss are not particularly surprising, but it is good to know that, for instance, in multiclass classification with the 0-1 loss, the complexity measure in the agnostic sample complexity is the same as that in the realizable-case (up to log factors, but no extra factors such as log( Y) not present in the realizable-case sample complexity). They also prove a tighter lower bound than previously known for the sample complexity of uniform convergence for multiclass classification in Theorem 3.6. The techniques used in the proofs are mostly straightforward or have appeared in other related contexts previously.
Separation Results between Fixed-Kernel and Feature-Learning Probability Metrics
Several works in implicit and explicit generative modeling empirically observed that feature-learning discriminators outperform fixed-kernel discriminators in terms of the sample quality of the models. We provide separation results between probability metrics with fixed-kernel and feature-learning discriminators using the function classes \mathcal{F}_2 and \mathcal{F}_1 respectively, which were developed to study overparametrized two-layer neural networks. In particular, we construct pairs of distributions over hyper-spheres that can not be discriminated by fixed kernel (\mathcal{F}_2) integral probability metric (IPM) and Stein discrepancy (SD) in high dimensions, but that can be discriminated by their feature learning ( \mathcal{F}_1) counterparts. To further study the separation we provide links between the \mathcal{F}_1 and \mathcal{F}_2 IPMs with sliced Wasserstein distances. Our work suggests that fixed-kernel discriminators perform worse than their feature learning counterparts because their corresponding metrics are weaker.