Goto

Collaborating Authors

 weakly supervised learning


A Unified and Stable Risk Minimization Framework for Weakly Supervised Learning with Theoretical Guarantees

Zhang, Miao, Li, Junpeng, Hua, Changchun, Yang, Yana

arXiv.org Artificial Intelligence

Weakly supervised learning has emerged as a practical alternative to fully supervised learning when complete and accurate labels are costly or infeasible to acquire. However, many existing methods are tailored to specific supervision patterns -- such as positive-unlabeled (PU), unlabeled-unlabeled (UU), complementary-label (CLL), partial-label (PLL), or similarity-unlabeled annotations -- and rely on post-hoc corrections to mitigate instability induced by indirect supervision. We propose a principled, unified framework that bypasses such post-hoc adjustments by directly formulating a stable surrogate risk grounded in the structure of weakly supervised data. The formulation naturally subsumes diverse settings -- including PU, UU, CLL, PLL, multi-class unlabeled, and tuple-based learning -- under a single optimization objective. We further establish a non-asymptotic generalization bound via Rademacher complexity that clarifies how supervision structure, model capacity, and sample size jointly govern performance. Beyond this, we analyze the effect of class-prior misspecification on the bound, deriving explicit terms that quantify its impact, and we study identifiability, giving sufficient conditions -- most notably via supervision stratification across groups -- under which the target risk is recoverable. Extensive experiments show consistent gains across class priors, dataset scales, and class counts -- without heuristic stabilization -- while exhibiting robustness to overfitting.


More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

Neural Information Processing Systems

We consider the weakly supervised binary classification problem where the labels are randomly flipped with probability $1-\alpha$. Although there exist numerous algorithms for this problem, it remains theoretically unexplored how the statistical accuracies and computational efficiency of these algorithms depend on the degree of supervision, which is quantified by $\alpha$. In this paper, we characterize the effect of $\alpha$ by establishing the information-theoretic and computational boundaries, namely, the minimax-optimal statistical accuracy that can be achieved by all algorithms, and polynomial-time algorithms under an oracle computational model. For small $\alpha$, our result shows a gap between these two boundaries, which represents the computational price of achieving the information-theoretic boundary due to the lack of supervision. Interestingly, we also show that this gap narrows as $\alpha$ increases. In other words, having more supervision, i.e., more correct labels, not only improves the optimal statistical accuracy as expected, but also enhances the computational efficiency for achieving such accuracy.


Ising Models with Hidden Markov Structure: Applications to Probabilistic Inference in Machine Learning

Herrera, F., Rozikov, U. A., Velasco, M. V.

arXiv.org Artificial Intelligence

In this paper, we investigate tree-indexed Markov chains (Gibbs measures) defined by a Hamiltonian that couples two Ising layers: hidden spins \(s(x) \in \{\pm 1\}\) and observed spins \(σ(x) \in \{\pm 1\}\) on a Cayley tree. The Hamiltonian incorporates Ising interactions within each layer and site-wise emission couplings between layers, extending hidden Markov models to a bilayer Markov random field. Specifically, we explore translation-invariant Gibbs measures (TIGM) of this Hamiltonian on Cayley trees. Under certain explicit conditions on the model's parameters, we demonstrate that there can be up to three distinct TIGMs. Each of these measures represents an equilibrium state of the spin system. These measures provide a structured approach to inference on hierarchical data in machine learning. They have practical applications in tasks such as denoising, weakly supervised learning, and anomaly detection. The Cayley tree structure is particularly advantageous for exact inference due to its tractability.


Advancing Arabic Speech Recognition Through Large-Scale Weakly Supervised Learning

Salhab, Mahmoud, Elghitany, Marwan, Sait, Shameed, Ullah, Syed Sibghat, Abusheikh, Mohammad, Abusheikh, Hasan

arXiv.org Artificial Intelligence

Automatic speech recognition (ASR) is crucial for human-machine interaction in diverse applications like conversational agents, industrial robotics, call center automation, and automated subtitling. However, developing high-performance ASR models remains challenging, particularly for low-resource languages like Arabic, due to the scarcity of large, labeled speech datasets, which are costly and labor-intensive to produce. In this work, we employ weakly supervised learning to train an Arabic ASR model using the Conformer architecture. Our model is trained from scratch on 15,000 hours of weakly annotated speech data covering both Modern Standard Arabic (MSA) and Dialectal Arabic (DA), eliminating the need for costly manual transcriptions. Despite the absence of human-verified labels, our approach achieves state-of-the-art (SOTA) results in Arabic ASR, surpassing both open and closed-source models on standard benchmarks. By demonstrating the effectiveness of weak supervision as a scalable, cost-efficient alternative to traditional supervised approaches, paving the way for improved ASR systems in low resource settings.


Reviews: More Supervision, Less Computation: Statistical-Computational Tradeoffs in Weakly Supervised Learning

Neural Information Processing Systems

This paper is interesting and deals with new kind of results introducing computational aspects in standard minimax theory. The phenomenon illustrated is new to me, and present some limitation of the computationally tractable algorithm w.r.t. "theoretical" ones that could be considered in the classical minimax theory. However, due to the relative novelty of the framework, it would be important that basic definitions and properties be better presented. In the following there is only one model investigated.


Reviews: beta-risk: a New Surrogate Risk for Learning from Weakly Labeled Data

Neural Information Processing Systems

UPDATE AFTER REBUTTAL: I still feel the value of the framework isn't fully convincing. My basic issue is that for the weakly supervised scenarios that don't already have principled algorithms, the precise reason the provided formulation is superior is unclear. For example in SSL the proposed method has a very similar flavour to self-training. I like however that there is an attempt at an unified approach to a range of such problems, and it's possible that one could do something interesting with this framework in future work. On the technical side, I still don't quite get the optimisation proposed for beta (Line 144).


Weakly Supervised Learning on Large Graphs

Prakash, Aditya

arXiv.org Artificial Intelligence

Graph classification plays a pivotal role in various domains, including pathology, where images can be represented as graphs.In this domain, images can be represented as graphs, where nodes might represent individual nuclei, and edges capture the spatial or functional relationships between them. Often, the overall label of the graph, such as a cancer type or disease state, is determined by patterns within smaller, localized regions of the image. This work introduces a weakly-supervised graph classification framework leveraging two subgraph extraction techniques: (1) Sliding-window approach (2) BFS-based approach. Subgraphs are processed using a Graph Attention Network (GAT), which employs attention mechanisms to identify the most informative subgraphs for classification. Weak supervision is achieved by propagating graph-level labels to subgraphs, eliminating the need for detailed subgraph annotations.


Analysis of Hybrid Compositions in Animation Film with Weakly Supervised Learning

Portos, Mónica Apellaniz, Labadie-Tamayo, Roberto, Stemmler, Claudius, Feyersinger, Erwin, Babic, Andreas, Bruckner, Franziska, Öhner, Vrääth, Zeppelzauer, Matthias

arXiv.org Artificial Intelligence

We present an approach for the analysis of hybrid visual compositions in animation in the domain of ephemeral film. We combine ideas from semi-supervised and weakly supervised learning to train a model that can segment hybrid compositions without requiring pre-labeled segmentation masks. We evaluate our approach on a set of ephemeral films from 13 film archives. Results demonstrate that the proposed learning strategy yields a performance close to a fully supervised baseline. On a qualitative level the performed analysis provides interesting insights on hybrid compositions in animation film.


RACH-Space: Reconstructing Adaptive Convex Hull Space with Applications in Weak Supervision

Na, Woojoo, Tasissa, Abiy

arXiv.org Artificial Intelligence

We introduce RACH-Space, an algorithm for labelling unlabelled data in weakly supervised learning, given incomplete, noisy information about the labels. RACH-Space offers simplicity in implementation without requiring hard assumptions on data or the sources of weak supervision, and is well suited for practical applications where fully labelled data is not available. Our method is built upon a geometrical interpretation of the space spanned by the set of weak signals. We also analyze the theoretical properties underlying the relationship between the convex hulls in this space and the accuracy of our output labels, bridging geometry with machine learning. Empirical results demonstrate that RACH-Space works well in practice and compares favorably to the best existing label models for weakly supervised learning.


Losses over Labels: Weakly Supervised Learning via Direct Loss Construction

Sam, Dylan, Kolter, J. Zico

arXiv.org Artificial Intelligence

Owing to the prohibitive costs of generating large amounts of labeled data, programmatic weak supervision is a growing paradigm within machine learning. In this setting, users design heuristics that provide noisy labels for subsets of the data. These weak labels are combined (typically via a graphical model) to form pseudolabels, which are then used to train a downstream model. In this work, we question a foundational premise of the typical weakly supervised learning pipeline: given that the heuristic provides all ``label" information, why do we need to generate pseudolabels at all? Instead, we propose to directly transform the heuristics themselves into corresponding loss functions that penalize differences between our model and the heuristic. By constructing losses directly from the heuristics, we can incorporate more information than is used in the standard weakly supervised pipeline, such as how the heuristics make their decisions, which explicitly informs feature selection during training. We call our method Losses over Labels (LoL) as it creates losses directly from heuristics without going through the intermediate step of a label. We show that LoL improves upon existing weak supervision methods on several benchmark text and image classification tasks and further demonstrate that incorporating gradient information leads to better performance on almost every task.