Goto

Collaborating Authors

 person 1




Scalable Oversight for Superhuman AI via Recursive Self-Critiquing

arXiv.org Artificial Intelligence

As AI capabilities increasingly surpass human proficiency in complex tasks, current alignment techniques including SFT and RLHF face fundamental challenges in ensuring reliable oversight. These methods rely on direct human assessment and become untenable when AI outputs exceed human cognitive thresholds. In response to this challenge, we explore two hypotheses: (1) critique of critique can be easier than critique itself, extending the widely-accepted observation that verification is easier than generation to the critique domain, as critique itself is a specialized form of generation; (2) this difficulty relationship is recursively held, suggesting that when direct evaluation is infeasible, performing high-order critiques (e.g., critique of critique of critique) offers a more tractable supervision pathway. To examine these hypotheses, we perform Human-Human, Human-AI, and AI-AI experiments across multiple tasks. Our results demonstrate encouraging evidence supporting these hypotheses and suggest that recursive self-critiquing is a promising direction for scalable oversight.


Distributive Fairness in Large Language Models: Evaluating Alignment with Human Values

arXiv.org Artificial Intelligence

The growing interest in employing large language models (LLMs) for decision-making in social and economic contexts has raised questions about their potential to function as agents in these domains. A significant number of societal problems involve the distribution of resources, where fairness, along with economic efficiency, play a critical role in the desirability of outcomes. In this paper, we examine whether LLM responses adhere to fundamental fairness concepts such as equitability, envy-freeness, and Rawlsian maximin, and investigate their alignment with human preferences. We evaluate the performance of several LLMs, providing a comparative benchmark of their ability to reflect these measures. Our results demonstrate a lack of alignment between current LLM responses and human distributional preferences. Moreover, LLMs are unable to utilize money as a transferable resource to mitigate inequality. Nonetheless, we demonstrate a stark contrast when (some) LLMs are tasked with selecting from a predefined menu of options rather than generating one. In addition, we analyze the robustness of LLM responses to variations in semantic factors (e.g. intentions or personas) or non-semantic prompting changes (e.g. templates or orderings). Finally, we highlight potential strategies aimed at enhancing the alignment of LLM behavior with well-established fairness concepts.


EDEN: Empathetic Dialogues for English learning

arXiv.org Artificial Intelligence

Dialogue systems have been used as conversation partners in English learning, but few have studied whether these systems improve learning outcomes. Student passion and perseverance, or grit, has been associated with language learning success. Recent work establishes that as students perceive their English teachers to be more supportive, their grit improves. Hypothesizing that the same pattern applies to English-teaching chatbots, we create EDEN, a robust open-domain chatbot for spoken conversation practice that provides empathetic feedback. To construct EDEN, we first train a specialized spoken utterance grammar correction model and a high-quality social chit-chat conversation model. We then conduct a preliminary user study with a variety of strategies for empathetic feedback. Our experiment suggests that using adaptive empathetic feedback leads to higher perceived affective support, which, in turn, predicts increased student grit.


Detecting Multimodal Situations with Insufficient Context and Abstaining from Baseless Predictions

arXiv.org Artificial Intelligence

Despite the widespread adoption of Vision-Language Understanding (VLU) benchmarks such as VQA v2, OKVQA, A-OKVQA, GQA, VCR, SWAG, and VisualCOMET, our analysis reveals a pervasive issue affecting their integrity: these benchmarks contain samples where answers rely on assumptions unsupported by the provided context. Training models on such data foster biased learning and hallucinations as models tend to make similar unwarranted assumptions. To address this issue, we collect contextual data for each sample whenever available and train a context selection module to facilitate evidence-based model predictions. Strong improvements across multiple benchmarks demonstrate the effectiveness of our approach. Further, we develop a general-purpose Context-AwaRe Abstention (CARA) detector to identify samples lacking sufficient context and enhance model accuracy by abstaining from responding if the required context is absent. CARA exhibits generalization to new benchmarks it wasn't trained on, underscoring its utility for future VLU benchmarks in detecting or cleaning samples with inadequate context. Finally, we curate a Context Ambiguity and Sufficiency Evaluation (CASE) set to benchmark the performance of insufficient context detectors. Overall, our work represents a significant advancement in ensuring that vision-language models generate trustworthy and evidence-based outputs in complex real-world scenarios.


People with paralysis navigate a room via a mind-controlled wheelchair

New Scientist

Three people with paralysis of all four limbs used their thoughts to steer a wheelchair through a cluttered room with a reasonably high level of accuracy. This suggests people with paralysis could move independently through certain rooms, but the technology may not be advanced enough to navigate a busy street. A range of different researchers have previously used two main strategies to test mind-controlled wheelchairs on non-disabled people. The first involves a person focusing on a flickering light in a particular location. This generates brain signals that an artificial intelligence translates into wheelchair movements towards that location, but this approach often leads to eyestrain.


BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage

arXiv.org Artificial Intelligence

We present BlenderBot 3, a 175B parameter dialogue model capable of open-domain conversation with access to the internet and a long-term memory, and having been trained on a large number of user defined tasks. We release both the model weights and code, and have also deployed the model on a public web page to interact with organic users. This technical report describes how the model was built (architecture, model and training scheme), and details of its deployment, including safety mechanisms. Human evaluations show its superiority to existing open-domain dialogue agents, including its predecessors (Roller et al., 2021; Komeili et al., 2022). Finally, we detail our plan for continual learning using the data collected from deployment, which will also be publicly released. The goal of this research program is thus to enable the community to study ever-improving responsible agents that learn through interaction.


Google Assistant's one step closer to passing the Turing test

#artificialintelligence

In a building called the Partnerplex on Google's sprawling campus in Mountain View, California, I've been invited to hear a 51-second phone recording of someone making a dinner reservation. Person 2: Hi, um, I'd like to reserve a table for Friday the third. Person 1: OK, hold on one moment. As I listen to what sounds like a man and a woman talking, Google's top executives for Assistant, the search giant's digital helper, watch closely to gauge my reaction. They're showing off the Assistant's new tricks a few days before Google I/O, the company's annual developer conference that starts Tuesday. Turns out this particular trick is pretty wild. That's because Person 2, the one who sounds like a man, isn't a person at all.


Explaining Your Machine Learning Models with SHAP and LIME!

#artificialintelligence

Welcome back again to another data science quick tip. This particular post is most interesting for me not only because this is the most complex subject we've tackled to date, but it's also one that I just spent the last few hours learning myself. And of course, what better way to learn than to figure out how to teach it to the masses? Before getting into it, I've uploaded all the work shown in this post to a singular Jupyter notebook. You can find it at my personal GitHub if you'd like to follow along more closely.