Goto

Collaborating Authors

 pag


Causal Identification under Markov equivalence: Calculus, Algorithm, and Completeness

Neural Information Processing Systems

One common task in many data sciences applications is to answer questions about the effect of new interventions, like: 'what would happen to Y if we make X equal to x while observing covariates Z = z?'. Formally, this is known as conditional effect identification, where the goal is to determine whether a post-interventional distribution is computable from the combination of an observational distribution and assumptions about the underlying domain represented by a causal diagram. A plethora of methods was developed for solving this problem, including the celebrated do-calculus [Pearl, 1995]. In practice, these results are not always applicable since they require a fully specified causal diagram as input, which is usually not available. In this paper, we assume as the input of the task a less informative structure known as a partial ancestral graph (PAG), which represents a Markov equivalence class of causal diagrams, learnable from observational data.


Supplementary Material: Iterative Causal Discovery in the Possible Presence of Latent Confounders and Selection Bias

Neural Information Processing Systems

In this section we provide a detailed proof for the correctness and completeness of the ICD algorithm. For easier referencing we describe ICD in Algorithm 1, and describe the ICD-Sep conditions. A set Zis a subset of ICD-Sep(A,B) given r {0,...,|O| 2}, if and only if 1. |Z|= r, 2. Z Z, there exists a PDS-path ΠB(A,Z) such that, (a) |ΠB(A,Z)| r and (b) every node on ΠB(A,Z) is in Z, and 3. Z Z, node Z is a possible ancestor of Aor B (not a necessary condition). Denote A,B a pair of nodes from O that are connected in G and disconnected in D, and such that Ais not an ancestor of B in D. If A B |[Z0] S, where Z0 O is a minimal separating set having size n+ 1, then there exists a subset Z O having the same size of n+ 1 such that that A B |Z S, and for every node Z Zthere exists a PDS-path ΠB(A,Z) in G, such that every node V on the PDS-path is also in Z. Proof. It was previously shown that a minimal separating set for Aand B, where Ais not an ancestor of B, is a subset of D-Sep(A,B) (Spirtes et al., 2000, page 134 and Theorem 6.2; Spirtes et al., 1999).



144a3f71a03ab7c4f46f9656608efdb2-Paper.pdf

Neural Information Processing Systems

Understanding the underlying mechanisms is crucial for tasks such asexplaining aphenomenon, predicting, anddecision making. Pearl(2009) providedamachinery for automating the process of answering interventional and (retrospective) counterfactual queries even when only observed data is available, and determining if a query cannot be answered given the available data type (identifiability). This requires knowledge about the true underlying causal structure; however,inmanyreal-world situations, thisstructure isunknown.


PAG: Multi-Turn Reinforced LLM Self-Correction with Policy as Generative Verifier

arXiv.org Artificial Intelligence

Large Language Models (LLMs) have demonstrated impressive capabilities in complex reasoning tasks, yet they still struggle to reliably verify the correctness of their own outputs. Existing solutions to this verification challenge often depend on separate verifier models or require multi-stage self-correction training pipelines, which limit scalability. In this paper, we propose Policy as Generative Verifier (PAG), a simple and effective framework that empowers LLMs to self-correct by alternating between policy and verifier roles within a unified multi-turn reinforcement learning (RL) paradigm. Distinct from prior approaches that always generate a second attempt regardless of model confidence, PAG introduces a selective revision mechanism: the model revises its answer only when its own generative verification step detects an error. This verify-then-revise workflow not only alleviates model collapse but also jointly enhances both reasoning and verification abilities. Extensive experiments across diverse reasoning benchmarks highlight PAG's dual advancements: as a policy, it enhances direct generation and self-correction accuracy; as a verifier, its self-verification outperforms self-consistency.


A Method for Enhancing the Safety of Large Model Generation Based on Multi-dimensional Attack and Defense

arXiv.org Artificial Intelligence

Currently, large models are prone to generating harmful content when faced with complex attack instructions, significantly reducing their defensive capabilities. To address this issue, this paper proposes a method based on constructing data aligned with multi-dimensional attack defense to enhance the generative security of large models. The core of our method lies in improving the effectiveness of safe alignment learning for large models by innova-tively increasing the diversity of attack instruction dimensions and the accuracy of generat-ing safe responses. To validate the effectiveness of our method, beyond existing security evaluation benchmarks, we additionally designed new security evaluation benchmarks and conducted comparative experiments using Llama3.2 as the baseline model. The final ex-perimental results demonstrate that our method can significantly improve the generative security of large models under complex instructional attacks, while also maintaining and enhancing the models' general capabilities.


MMDS: A Multimodal Medical Diagnosis System Integrating Image Analysis and Knowledge-based Departmental Consultation

arXiv.org Artificial Intelligence

We present MMDS, a system capable of recognizing medical images and patient facial details, and providing professional medical diagnoses. The system consists of two core components:The first component is the analysis of medical images and videos. We trained a specialized multimodal medical model capable of interpreting medical images and accurately analyzing patients' facial emotions and facial paralysis conditions. The model achieved an accuracy of 72.59% on the FER2013 facial emotion recognition dataset, with a 91.1% accuracy in recognizing the "happy" emotion. In facial paralysis recognition, the model reached an accuracy of 92%, which is 30% higher than that of GPT-4o. Based on this model, we developed a parser for analyzing facial movement videos of patients with facial paralysis, achieving precise grading of the paralysis severity. In tests on 30 videos of facial paralysis patients, the system demonstrated a grading accuracy of 83.3%.The second component is the generation of professional medical responses. We employed a large language model, integrated with a medical knowledge base, to generate professional diagnoses based on the analysis of medical images or videos. The core innovation lies in our development of a department-specific knowledge base routing management mechanism, in which the large language model categorizes data by medical departments and, during the retrieval process, determines the appropriate knowledge base to query. This significantly improves retrieval accuracy in the RAG (retrieval-augmented generation) process.


A Post-Training Enhanced Optimization Approach for Small Language Models

arXiv.org Artificial Intelligence

This paper delves into the continuous post-training optimization methods for small language models, and proposes a continuous post-training alignment data construction method for small language models. The core of this method is based on the data guidance of large models, optimizing the diversity and accuracy of alignment data. In addition, to verify the effectiveness of the methods in this paper, we used Qwen2-0.5B-Instruct model as the baseline model for small language models, using the alignment dataset constructed by our proposed method, we trained and compared several groups of experiments, including SFT (Supervised Fine Tuning) post-training experiment and KTO (Kahneman Tversky optimization) post-training experiment, as well as SFT-KTO two-stage post-training experiment and model weight fusion experiment. Finally, we evaluated and analyzed the performance of post-training models, and confirmed that the continuous post-training optimization method proposed by us can significantly improve the performance of small language models.


Self-Rectifying Diffusion Sampling with Perturbed-Attention Guidance

arXiv.org Artificial Intelligence

Recent studies have demonstrated that diffusion models are capable of generating high-quality samples, but their quality heavily depends on sampling guidance techniques, such as classifier guidance (CG) and classifier-free guidance (CFG). These techniques are often not applicable in unconditional generation or in various downstream tasks such as image restoration. In this paper, we propose a novel sampling guidance, called Perturbed-Attention Guidance (PAG), which improves diffusion sample quality across both unconditional and conditional settings, achieving this without requiring additional training or the integration of external modules. PAG is designed to progressively enhance the structure of samples throughout the denoising process. It involves generating intermediate samples with degraded structure by substituting selected self-attention maps in diffusion U-Net with an identity matrix, by considering the self-attention mechanisms' ability to capture structural information, and guiding the denoising process away from these degraded samples. In both ADM and Stable Diffusion, PAG surprisingly improves sample quality in conditional and even unconditional scenarios. Moreover, PAG significantly improves the baseline performance in various downstream tasks where existing guidances such as CG or CFG cannot be fully utilized, including ControlNet with empty prompts and image restoration such as inpainting and deblurring.


Enhancing Reinforcement Learning Agents with Local Guides

arXiv.org Artificial Intelligence

This paper addresses the problem of integrating local guide policies into a Reinforcement Learning agent. For this, we show how to adapt existing algorithms to this setting before introducing a novel algorithm based on a noisy policy-switching procedure. This approach builds on a proper Approximate Policy Evaluation (APE) scheme to provide a perturbation that carefully leads the local guides towards better actions. We evaluated our method on a set of classical Reinforcement Learning problems, including safety-critical systems where the agent cannot enter some areas at the risk of triggering catastrophic consequences. In all the proposed environments, our agent proved to be efficient at leveraging those policies to improve the performance of any APE-based Reinforcement Learning algorithm, especially in its first learning stages.