Country
Benchmarking Retrieval-Augmented Multimodal Generation for Document Question Answering
Current document retrieval-augmented generation (DocRAG) Therefore, the number of female respondents who never listened to theradio is: Number of females who never listened = 2,001 * 0.557 = 1,115 methods remain limited by their text-centric approaches, frequently missing "text12": [ "The table provides a
Robust SuperAlignment: Weak-to-Strong Robustness Generalization for Vision-Language Models
Numerous well-established studies have demonstrated the superhuman capabilities of modern Vision-Language Models (VLMs) across a wide range of tasks. However, growing is the doubt about the continuing availability of reliable high-quality labeling (supervision) from human annotators, leading to stagnation of the model's performance. To address this challenge, "superalignment" employs the so-called weak-to-strong generalization paradigm, where the supervision from a weak model can provide generalizable knowledge for a strong model. While effective in aligning knowledge for clean samples between the strong and weak models, the standard weak-to-strong approach typically fails to capture adversarial robustness, exposing strong VLMs to adversarial attacks. This inability to transfer adversarial robustness is because adversarial samples are normally missing in the superalignment stage. To this end, we are the first to propose the weak-to-strong (adversarial) robustness generalization method to elicit zero-shot robustness in large-scale models by an unsupervised scheme, mitigating the unreliable information source for alignment from two perspectives: alignment re-weighting and source guidance refinement. We analyze settings under which robustness generalization is possible.
Learning from Disjoint Views: AContrastive Prototype Matching Network for Fully Incomplete Multi-View Clustering
Multi-view clustering aims to enhance clustering performance by leveraging information from diverse sources. However, its practical application is often hindered by a barrier: the lack of correspondences across views. This paper focuses on the understudied problem of fully incomplete multi-view clustering (FIMC), a scenario where existing methods fail due to their reliance on partial alignment. To address this problem, we introduce the Contrastive Prototype Matching Network (CPMN), a novel framework that establishes a new paradigm for cross-view alignment based on matching high-level categorical structures. Instead of aligning individual instances, CPMN performs a more robust cluster prototype alignment. CPMN first employs a correspondence-free graph contrastive learning approach, leveraging mutual k-nearest neighbors (MNN) to uncover intrinsic data structures and establish initial prototypes from entirely unpaired views. Building on the prototypes, we introduce a cross-view prototype graph matching stage to resolve category misalignment and forge a unified clustering structure. Finally, guided by this alignment, we devise a prototype-aware contrastive learning mechanism to promote semantic consistency, replacing the reliance on the initial MNN-based structural similarity. Extensive experiments on benchmark datasets demonstrate that our method significantly outperforms various baselines and ablation variants, validating its effectiveness.
Replicable Distribution Testing
We initiate a systematic investigation of distribution testing in the framework of algorithmic replicability. Specifically, given independent samples from a collection of probability distributions, the goal is to characterize the sample complexity of replicably testing natural properties of the underlying distributions. On the algorithmic front, we develop new replicable algorithms for testing closeness and independence of discrete distributions. On the lower bound front, we develop a new methodology for proving sample complexity lower bounds for replicable testing that may be of broader interest. As an application of our technique, we establish near-optimal sample complexity lower bounds for replicable uniformity testing--answering an open question from prior work--and closeness testing.
Simple and Effective Specialized Representations for Fair Classifiers
Fair classification is a critical challenge that has gained increasing importance due to international regulations and its growing use in high-stakes decision-making settings. Existing methods often rely on adversarial learning or distribution matching across sensitive groups; however, adversarial learning can be unstable, and distribution matching can be computationally intensive. To address these limitations, we propose a novel approach based on the characteristic function distance. Our method ensures that the learned representation contains minimal sensitive information while maintaining high effectiveness for downstream tasks. By utilizing characteristic functions, we achieve a more stable and efficient solution compared to traditional methods. Additionally, we introduce a simple relaxation of the objective function that guarantees fairness in common classification models with no performance degradation. Experimental results on benchmark datasets demonstrate that our approach consistently matches or achieves better fairness and predictive accuracy than existing methods. Moreover, our method maintains robustness and computational efficiency, making it a practical solution for real-world applications.
Last Iterate Convergence in Monotone Mean Field Games
However, existing algorithms either require strict monotonicity or only guarantee the convergence of averaged iterates, as in Fictitious Play in continuous time. We address this gap with the following theoretical result. First, we prove that the last-iterated policy of a proximal-point (PP) update with KL regularization converges to an equilibrium of MFG under non-strict monotonicity. Second, we see that each PP update is equivalent to finding the equilibria of a KL-regularized MFG. We then prove that this equilibrium can be found using Mirror Descent (MD) with an exponential last-iterate convergence rate. Building on these insights, we propose the Approximate Proximal-Point (APP) algorithm, which approximately implements the PP update via a small number of MD steps. Numerical experiments on standard benchmarks confirm that the APP algorithm reliably converges to the unregularized mean-field equilibrium without time-averaging.
Gaussian Processes for Shuffled Regression
Shuffled regression is the problem of learning regression functions from shuffled data where the correspondence between the input features and target response is unknown. This paper proposes a probabilistic model for shuffled regression called Gaussian Process Shuffled Regression (GPSR). By introducing Gaussian processes as a prior of regression functions in function space via the kernel function, GPSR can express a wide variety of functions in a nonparametric manner while quantifying the uncertainty of the prediction. By adopting the Bayesian evidence maximization framework and a theoretical analysis of the connection between the marginal likelihood/predictive distribution of GPSR and that of standard Gaussian process regression (GPR), we derive an easy-to-implement inference algorithm for GPSR that iteratively applies GPR and updates the input-output correspondence. To reduce computation costs and obtain closed-form solutions for correspondence updates, we also develop a sparse approximate variant of GPSR using its weight space formulation, which can be seen as Bayesian shuffled linear regression with random Fourier features. Experiments on benchmark datasets confirm the effectiveness of our GPSR proposal.
Stability and Sharper Risk Bounds with Convergence Rate O(1/n2)
Prior work (Klochkov & Zhivotovskiy, 2021) establishes at most O(log(n)/n) excess risk bounds via algorithmic stability for strongly-convex learners with high probability. We show that under the similar common assumptions -- PolyakLojasiewicz condition, smoothness, and Lipschitz continous for losses -- rates of O log2(n)/n2 are at most achievable. To our knowledge, our analysis also provides the tightest high-probability bounds for gradient-based generalization gaps in nonconvex settings.
1a17a06de88cf77f25cda0da91615a54-Paper-Conference.pdf
Current Vision-Language Models (VLMs) struggle with fine-grained spatial reasoning, particularly when multi-step logic and precise spatial alignment are required. In this work, we introduce SpatialReasoner-R1, a vision-language reasoning Gemini 2.0 modelFladesignedsh Llama to address4 Mavthese limitations.erick
An Adaptive Quantum Circuit of Dempster's Rule of Combination for Uncertain Pattern Classification
In pattern classification, efficient uncertainty reasoning plays a critical role, particularly in real-time applications involving noisy data, ambiguous class boundaries, or overlapping categories. Leveraging the advanced computational power of quantum computing, an Adaptive Quantum Circuit for Dempster's Rule of Combination (AQC-DRC) is proposed to address efficient classification under uncertain environments. The AQC-DRC is developed within the framework of quantum evidence theory (QET) and facilitates decision-making based on quantum basic probability and plausibility levels, which is a generalized Bayesian inference method. The AQC-DRC provides a deterministic computation of DRC, ensuring that quantum fusion outcomes in uncertain pattern classification are exactly aligned with those of the classical method, while simultaneously achieving exponential reductions in the computational complexity of evidence combination and significantly improving fusion efficiency. It is founded that the quantum basic probability amplitude function in QET, as a generalized quantum probability amplitude, can be naturally utilized to express the quantum amplitude encoding. In addition, the quantum basic probability in QET, as a generalized quantum probability, naturally forms a quantum basic probability distribution and can be used to represent quantum measurement outcomes for quantum basic probability level decision-making. Furthermore, the quantum plausibility function in QET also can be naturally used to express the quantum measurement outcomes for quantum plausibility level decision-making. These findings enrich the physical understanding of quantum amplitude encoding and quantum measurement outcomes, offering broad application prospects for representing and processing uncertain knowledge in pattern classification.