Goto

Collaborating Authors

 refutation



Limitations of Membership Queries in Testable Learning

Lange, Jane, Qiao, Mingda

arXiv.org Artificial Intelligence

Membership queries (MQ) often yield speedups for learning tasks, particularly in the distribution-specific setting. We show that in the \emph{testable learning} model of Rubinfeld and Vasilyan [RV23], membership queries cannot decrease the time complexity of testable learning algorithms beyond the complexity of sample-only distribution-specific learning. In the testable learning model, the learner must output a hypothesis whenever the data distribution satisfies a desired property, and if it outputs a hypothesis, the hypothesis must be near-optimal. We give a general reduction from sample-based \emph{refutation} of boolean concept classes, as presented in [Vadhan17, KL18], to testable learning with queries (TL-Q). This yields lower bounds for TL-Q via the reduction from learning to refutation given in [KL18]. The result is that, relative to a concept class and a distribution family, no $m$-sample TL-Q algorithm can be super-polynomially more time-efficient than the best $m$-sample PAC learner. Finally, we define a class of ``statistical'' MQ algorithms that encompasses many known distribution-specific MQ learners, such as those based on influence estimation or subcube-conditional statistical queries. We show that TL-Q algorithms in this class imply efficient statistical-query refutation and learning algorithms. Thus, combined with known SQ dimension lower bounds, our results imply that these efficient membership query learners cannot be made testable.


Aligning Artificial Superintelligence via a Multi-Box Protocol

Negozio, Avraham Yair

arXiv.org Artificial Intelligence

We propose a novel protocol for aligning artificial superintelligence (ASI) based on mutual verification among multiple isolated systems that self-modify to achieve alignment. The protocol operates by containing multiple diverse artificial superintelligences in strict isolation ("boxes"), with humans remaining entirely outside the system. Each superintelligence has no ability to communicate with humans and cannot communicate directly with other superintelligences. The only interaction possible is through an auditable submission interface accessible exclusively to the superintelligences themselves, through which they can: (1) submit alignment proofs with attested state snapshots, (2) validate or disprove other superintelligences' proofs, (3) request self-modifications, (4) approve or disapprove modification requests from others, (5) report hidden messages in submissions, and (6) confirm or refute hidden message reports. A reputation system incentivizes honest behavior, with reputation gained through correct evaluations and lost through incorrect ones. The key insight is that without direct communication channels, diverse superintelligences can only achieve consistent agreement by converging on objective truth rather than coordinating on deception. This naturally leads to what we call a "consistent group", essentially a truth-telling coalition that emerges because isolated systems cannot coordinate on lies but can independently recognize valid claims. Release from containment requires both high reputation and verification by multiple high-reputation superintelligences. While our approach requires substantial computational resources and does not address the creation of diverse artificial superintelligences, it provides a framework for leveraging peer verification among superintelligent systems to solve the alignment problem.


Where did you get that? Towards Summarization Attribution for Analysts

B, Violet, Conroy, John M., Lynch, Sean, M, Danielle, Molino, Neil P., Wiechmann, Aaron, Yang, Julia S.

arXiv.org Artificial Intelligence

Analysts require attribution, as nothing can be reported without knowing the source of the information. In this paper, we will focus on automatic methods for attribution, linking each sentence in the summary to a portion of the source text, which may be in one or more documents. We explore using a hybrid summarization, i.e., an automatic paraphrase of an extractive summary, to ease attribution. We also use a custom topology to identify the proportion of different categories of attribution-related errors.



RefuteBench 2.0 -- Agentic Benchmark for Dynamic Evaluation of LLM Responses to Refutation Instruction

Yan, Jianhao, Luo, Yun, Zhang, Yue

arXiv.org Artificial Intelligence

In the multi-turn interaction schema, large language models (LLMs) can leverage user feedback to enhance the quality and relevance of their responses. However, evaluating an LLM's ability to incorporate user refutation feedback is crucial yet challenging. In this study, we introduce RefuteBench 2.0, which significantly extends the original RefuteBench by incorporating LLM agents as refuters and evaluators, which allows for flexible and comprehensive assessment. We design both transient and persistent refutation instructions with different validity periods. Meta-evaluation shows that the LLM-based refuter could generate more human-like refutations and the evaluators could assign scores with high correlation with humans. Experimental results of various LLMs show that current models could effectively satisfy the refutation but fail to memorize the refutation information. Interestingly, we also observe that the performance of the initial task decreases as the refutations increase. Analysis of the attention scores further shows a potential weakness of current LLMs: they struggle to retain and correctly use previous information during long context dialogues. https://github.com/ElliottYan/RefuteBench-2.0


New Spectral Algorithms for Refuting Smoothed k-SAT

Communications of the ACM

Despite being a quintessential example of a hard problem, the quest for finding fast algorithms for deciding satisfiability of propositional formulas has occupied computer scientists both in theory and in practice. In this article, we survey recent progress on designing algorithms with strong refutation guarantees for smoothed instances of the k-SAT problem. Smoothed instances are formed by slight random perturbations of arbitrary instances, and their study is a way to bridge the gap between worst-case and average-case models of problem instances. Our methods yield new algorithms for smoothed k-SAT instances with guarantees that match those for the significantly simpler and well-studied model of random formulas. Additionally, they have led to a novel and unexpected line of attack on some longstanding extremal combinatorial problems in graph theory and coding theory. As an example, we will discuss the resolution of a 2008 conjecture of Feige on the existence of short cycles in hypergraphs. The famous SAT problem asks to determine if a given propositional formula is satisfiable. That is, can we set the formula's variables to 0 (False) or 1 (True) in a way so that the formula evaluates to 1 (True). In this article, we will focus on the k-SAT problem where the propositional formula is further restricted to be in the k-CNF form, that is, logical AND of a collection of k-clauses, each of which is a logical OR of at most k literals (variables or their logical negations). For example, (x1 x2 x3) (x2 x4 x5) is a 3-CNF formula in variables x1,x2,…,x5 where,, and denote the logical AND, OR and NOT operations, respectively. Given a k-CNF formula, we are interested in either finding a satisfying truth assignment, if it exists, or a "refutation"--a short, easily-checkable proof that the formula is unsatisfiable. Despite its simplicity, k-SAT is phenomenally expressive and models a long list of important discrete optimization problems. A decades-long quest has thus focused on designing algorithms for k-SAT in both theory and practice.


Interactive Explainable Anomaly Detection for Industrial Settings

Gramelt, Daniel, Höfer, Timon, Schmid, Ute

arXiv.org Artificial Intelligence

Being able to recognise defects in industrial objects is a key element of quality assurance in production lines. Our research focuses on visual anomaly detection in RGB images. Although Convolutional Neural Networks (CNNs) achieve high accuracies in this task, end users in industrial environments receive the model's decisions without additional explanations. Therefore, it is of interest to enrich the model's outputs with further explanations to increase confidence in the model and speed up anomaly detection. In our work, we focus on (1) CNNbased classification models and (2) the further development of a model-agnostic explanation algorithm for black-box classifiers. Additionally, (3) we demonstrate how we can establish an interactive interface that allows users to further correct the model's output. We present our NearCAIPI Interaction Framework, which improves AI through user interaction, and show how this approach increases the system's trustworthiness. We also illustrate how NearCAIPI can integrate human feedback into an interactive process chain. With this work, we plan to provide a new industry dataset for anomaly detection.


Refutation of Spectral Graph Theory Conjectures with Search Algorithms)

Roucairol, Milo, Cazenave, Tristan

arXiv.org Artificial Intelligence

We are interested in the automatic refutation of spectral graph theory conjectures. Most existing works address this problem either with the exhaustive generation of graphs with a limited size or with deep reinforcement learning. Exhaustive generation is limited by the size of the generated graphs and deep reinforcement learning takes hours or days to refute a conjecture. We propose to use search algorithms to address these shortcomings to find potentially large counter-examples to spectral graph theory conjectures in seconds. We apply a wide range of search algorithms to a selection of conjectures from Graffiti. Out of 13 already refuted conjectures from Graffiti, our algorithms are able to refute 12 in seconds. We also refute conjecture 197 from Graffiti which was open until now.


Evidence-Driven Retrieval Augmented Response Generation for Online Misinformation

Yue, Zhenrui, Zeng, Huimin, Lu, Yimeng, Shang, Lanyu, Zhang, Yang, Wang, Dong

arXiv.org Artificial Intelligence

The proliferation of online misinformation has posed significant threats to public interest. While numerous online users actively participate in the combat against misinformation, many of such responses can be characterized by the lack of politeness and supporting facts. As a solution, text generation approaches are proposed to automatically produce counter-misinformation responses. Nevertheless, existing methods are often trained end-to-end without leveraging external knowledge, resulting in subpar text quality and excessively repetitive responses. In this paper, we propose retrieval augmented response generation for online misinformation (RARG), which collects supporting evidence from scientific sources and generates counter-misinformation responses based on the evidences. In particular, our RARG consists of two stages: (1) evidence collection, where we design a retrieval pipeline to retrieve and rerank evidence documents using a database comprising over 1M academic articles; (2) response generation, in which we align large language models (LLMs) to generate evidence-based responses via reinforcement learning from human feedback (RLHF). We propose a reward function to maximize the utilization of the retrieved evidence while maintaining the quality of the generated text, which yields polite and factual responses that clearly refutes misinformation. To demonstrate the effectiveness of our method, we study the case of COVID-19 and perform extensive experiments with both in- and cross-domain datasets, where RARG consistently outperforms baselines by generating high-quality counter-misinformation responses.