triplet
Absolute Zero: Reinforced Self-play Reasoning with Zero Data
Reinforcement learning with verifiable rewards (RLVR) has shown promise in enhancing the reasoning capabilities of large language models by learning directly from rule-based outcome rewards. Recent RLVR works that operate under the zero setting avoid supervision in labeling the reasoning process, but still depend on manually curated collections of questions and answers for training. The scarcity of high-quality, human-produced examples raises concerns about the long-term scalability of relying on human supervision, a challenge already evident in the domain of language model pretraining. Furthermore, in a hypothetical future where AI surpasses human intelligence, tasks provided by humans may offer limited learning potential for a superintelligent system. To address these concerns, we propose a new RLVR paradigm called Absolute Zero, in which a single model learns to propose tasks that maximize its own learning progress and improves reasoning by solving them, without relying on any external human or distillation data. Under this paradigm, we introduce the Absolute Zero Reasoner (AZR), a system that self-evolves its training curriculum and reasoning ability. AZR uses a code executor to both validate self-proposed code reasoning tasks and verify answers, serving as an unified source of verifiable feedback to guide open-ended yet grounded learning. Despite being trained entirely without external data, AZR achieves overall SOTA performance on coding and mathematical reasoning tasks, outperforming existing zero-setting models that rely on tens of thousands of in-domain human-curated examples. Furthermore, we demonstrate that AZR can be effectively applied across different model scales and is compatible with various model classes.
Causal Discovery over Clusters of Variables in Markovian Systems
Causal discovery methods are powerful tools for uncovering the structure of relationships among variables, yet they face significant challenges in scalability and interpretability, especially in high-dimensional settings. In many domains, researchers are not only interested in causal links between individual variables, but also in relationships among sets or clusters of variables. Learning causal structure at the cluster level can both reveal higher-order relationships of interest and improve scalability. In this work, we introduce an approach for causal discovery over clusters in Markov causal systems. We propose a new graphical model that encodes knowledge of relationships between user-defined clusters while fully representing independencies and dependencies over clusters, faithful to a given distribution. We then define and characterize a graphical equivalence class of these models that share cluster-level independence information. Lastly, we present a sound and complete algorithm for causal discovery to represent learnable causal relationships between clusters of variables.
Role Bias in Diffusion Models: Diagnosing and Mitigating through Intermediate Decomposition
In this work, we introduce RoleBench, a benchmark focused on evaluating compositional generalization in action-based relations (e.g., "mouse chasing cat"). We show that state-of-the-art T2I models and compositional generation methods consistently default to frequent reversed relations (i.e., "cat chasing mouse"), a phenomenon we call role collapse. Related works attribute this to the model's architectural limitation or underrepresentation in the data. Our key insight reveals that while models fail on rare compositions when their inversions are common, they can successfully generate similar intermediate compositions (e.g., "mouse chasing boy"), suggesting that this limitation is also due to the presence of frequent counterparts rather than just the absence of rare compositions. Motivated by this, we hypothesize that directional decomposition can gradually mitigate role collapse. We test this via ReBind, a lightweight framework that teaches role bindings using carefully selected active/passive intermediate compositions. Experiments suggest that intermediate compositions through simple fine-tuning can significantly reduce role collapse, with humans preferring ReBind more than 78% compared to state-of-the-art methods. Our findings highlight the role of distributional asymmetries in compositional failures and offer a simple, effective path for improving generalization.
Markov Persuasion Processes: Learning to Persuade From Scratch
In Bayesian persuasion, an informed sender strategically discloses information to a receiver so as to persuade them to undertake desirable actions. Recently, Markov persuasion processes (MPPs) have been introduced to capture sequential scenarios where a sender faces a stream of myopic receivers in a Markovian environment. The MPPs studied so far in the literature suffer from issues that prevent them from being fully operational in practice, e.g., they assume that the sender knows receivers' rewards. We fix such issues by addressing MPPs where the sender has no knowledge about the environment.
KARMA: Leveraging Multi-Agent LLMs for Automated Knowledge Graph Enrichment
Maintaining comprehensive and up-to-date knowledge graphs (KGs) is critical for modern AI systems, but manual curation struggles to scale with the rapid growth of scientific literature. This paper presents KARMA, a novel framework employing multi-agent large language models (LLMs) to automate KG enrichment through structured analysis of unstructured text. Our approach employs nine collaborative agents, spanning entity discovery, relation extraction, schema alignment, and conflict resolution that iteratively parse documents, verify extracted knowledge, and integrate it into existing graph structures while adhering to domain-specific schema. Experiments on 1,200 PubMed articles from three different domains demonstrate the effectiveness of KARMA in knowledge graph enrichment, with the identification of up to 38,230 new entities while achieving 83.1% LLM-verified correctness and reducing conflict edges by 18.6% through multi-layer assessments.
The Complexity of Finding Local Optima in Contrastive Learning
The goal is to find representations (e.g., embeddings in Rd or a tree metric) where anchors are placed closer to positive than to negative examples. While finding global optima of contrastive objectives is NP-hard, the complexity of finding local optima--representations that do not improve by local search algorithms such as gradient-based methods--remains open. Our work settles the complexity of finding local optima in various contrastive learning problems by proving PLS-hardness in discrete settings (e.g., maximize satisfied triplets) and CLS-hardness in continuous settings (e.g., minimize Triplet Loss), where PLS(Polynomial Local Search) and CLS(Continuous Local Search) are well-studied complexity classes capturing local search dynamics in discrete and continuous optimization, respectively. Our results imply that no polynomial time algorithm (local search or otherwise) can find a local optimum for various contrastive learning problems, unless PLS P(or CLS P for continuous problems). Even in the unlikely scenario that PLS P(or CLS P), our reductions imply that there exist instances where local search algorithms need exponential time to reach a local optimum, even for d = 1(embeddings on a line).
MoEMeta: Mixture-of-Experts Meta Learning for Few-Shot Relational Learning
Few-shot knowledge graph relational learning seeks to perform reasoning over relations given only a limited number of training examples. While existing approaches largely adopt a meta-learning framework for enabling fast adaptation to new relations, they suffer from two key pitfalls. First, they learn relation metaknowledge in isolation, failing to capture common relational patterns shared across tasks. Second, they struggle to effectively incorporate local, task-specific contexts crucial for rapid adaptation. To address these limitations, we propose MoEMeta, a novel meta-learning framework that disentangles globally shared knowledge from task-specific contexts to enable both effective model generalization and rapid adaptation. MoEMeta introduces two key innovations: (i) a mixture-of-experts (MoE) model that learns globally shared relational prototypes to enhance generalization, and (ii) a task-tailored adaptation mechanism that captures local contexts for fast task-specific adaptation.
MiCo: Multi-image Contrast for Reinforcement Visual Reasoning
This work explores enabling Chain-of-Thought (CoT) reasoning to link visual cues across multiple images. A straightforward solution is to adapt rule-based reinforcement learning for Vision-Language Models (VLMs). However, such methods typically rely on manually curated question-answer pairs, which can be particularly challenging when dealing with fine-grained visual details and complex logic across images. Inspired by self-supervised visual representation learning, we observe that images contain inherent constraints that can serve as supervision. Based on this insight, we construct image triplets comprising two augmented views of the same image and a third, similar but distinct image. During training, the model is prompted to generate a reasoning process to compare these images (i.e., determine same or different).
CSGO: Content-Style Composition in Text-to-Image Generation
The advancement of image style transfer has been fundamentally constrained by the absence of large-scale, high-quality datasets with explicit content-style-stylized supervision. Existing methods predominantly adopt training-free paradigms (e.g., image inversion), which limit controllability and generalization due to the lack of structured triplet data. To bridge this gap, we design a scalable and automated pipeline that constructs and purifies high-fidelity content-style-stylized image triplets. Leveraging this pipeline, we introduce IMAGStyle--the first large-scale dataset of its kind, containing 210K diverse and precisely aligned triplets for style transfer research. Empowered by IMAGStyle, we propose CSGO, a unified, end-to-end trainable framework that decouples content and style representations via independent feature injection. CSGO jointly supports image-driven style transfer, text-driven stylized generation, and text-editing-driven stylized synthesis within a single architecture. Extensive experiments show that CSGO achieves state-of-the-art controllability and fidelity, demonstrating the critical role of structured synthetic data in unlocking robust and generalizable style transfer.