Goto

Collaborating Authors

 description


An AI-Powered Framework for Analyzing Collective Idea Evolution in Deliberative Assemblies

Poole-Dayan, Elinor, Roy, Deb, Kabbara, Jad

arXiv.org Artificial Intelligence

In an era of increasing societal fragmentation, political polarization, and erosion of public trust in institutions, representative deliberative assemblies are emerging as a promising democratic forum for developing effective policy outcomes on complex global issues. Despite theoretical attention, there remains limited empirical work that systematically traces how specific ideas evolve, are prioritized, or are discarded during deliberation to form policy recommendations. Addressing these gaps, this work poses two central questions: (1) How might we trace the evolution and distillation of ideas into concrete recommendations within deliberative assemblies? (2) How does the deliberative process shape delegate perspectives and influence voting dynamics over the course of the assembly? To address these questions, we develop LLM-based methodologies for empirically analyzing transcripts from a tech-enhanced in-person deliberative assembly. The framework identifies and visualizes the space of expressed suggestions. We also empirically reconstruct each delegate's evolving perspective throughout the assembly. Our methods contribute novel empirical insights into deliberative processes and demonstrate how LLMs can surface high-resolution dynamics otherwise invisible in traditional assembly outputs.


REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems

Geng, Longling, Chang, Edward Y.

arXiv.org Artificial Intelligence

This benchmark suite provides a comprehensive evaluation framework for assessing both individual LLMs and multi-agent systems in real-world planning scenarios. The suite encompasses eleven designed problems that progress from basic to highly complex, incorporating key aspects such as multi-agent coordination, inter-agent dependencies, and dynamic environmental disruptions. Each problem can be scaled along three dimensions: the number of parallel planning threads, the complexity of inter-dependencies, and the frequency of unexpected disruptions requiring real-time adaptation. The benchmark includes detailed specifications, evaluation metrics, and baseline implementations using contemporary frameworks like LangGraph, enabling rigorous testing of both single-agent and multi-agent planning capabilities. Through standardized evaluation criteria and scalable complexity, this benchmark aims to drive progress in developing more robust and adaptable AI planning systems for real-world applications.


Chain of Code: Reasoning with a Language Model-Augmented Code Emulator

Li, Chengshu, Liang, Jacky, Zeng, Andy, Chen, Xinyun, Hausman, Karol, Sadigh, Dorsa, Levine, Sergey, Fei-Fei, Li, Xia, Fei, Ichter, Brian

arXiv.org Artificial Intelligence

Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought reasoning not only for logic and arithmetic tasks, but also for semantic ones (and in particular, those that are a mix of both). For example, consider prompting an LM to write code that counts the number of times it detects sarcasm in an essay: the LM may struggle to write an implementation for "detect_sarcasm(string)" that can be executed by the interpreter (handling the edge cases would be insurmountable). However, LMs may still produce a valid solution if they not only write code, but also selectively "emulate" the interpreter by generating the expected output of "detect_sarcasm(string)" and other lines of code that cannot be executed. In this work, we propose Chain of Code (CoC), a simple yet surprisingly effective extension that improves LM code-driven reasoning. The key idea is to encourage LMs to format semantic sub-tasks in a program as flexible pseudocode that the interpreter can explicitly catch undefined behaviors and hand off to simulate with an LM (as an "LMulator"). Experiments demonstrate that Chain of Code outperforms Chain of Thought and other baselines across a variety of benchmarks; on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought. CoC scales well with large and small models alike, and broadens the scope of reasoning questions that LMs can correctly answer by "thinking in code". Project webpage: https://chain-of-code.github.io.


Relationship between Natural Language Processing and AI

AI Magazine

Modeling various aspects of language--syntax, semantics, pragmatics, and discourse, among others--by the use of constrained formal-computational systems, just adequate for such modeling, has proved to be an effective research strategy, leading to deep understanding of these aspects, with implications for both machine processing and human processing. This approach enables one to distinguish between the universal and stipulative constraints.


Toward Better Models Of The Design Process

AI Magazine

What are the powerful new ideas in knowledge based design? What important research issues require further investigation? Perhaps the key research problem in AIbased design for the 1980's is to develop better models of the design process. A comprehensive model of design should address the following aspects of the design process: the state of the design; the goal structure of the design process; design decisions; rationales for design decisions; control of the design process; and the role of learning in design This article presents some of the most important ideas emerging from current AI research on design, especially ideas for better models of design It is organized into sections dealing with each of the aspects of design listed above What is design? Why should we study it?


The Formative Years

AI Magazine

Department of Computer Science Carnegie-Mellon University Pittsburgh, Pennsylvania 15221 RI is a rule-based program that configures VAX-I 1 computer systems. Given a customer's purchase order, it determines what, if any, substitutions and additions have to be made to the order to make it consistent and complete and produces a numnber of diagrams showing the spatial and logical relationships among the 90 or so components that typically constitute a system. The program has been used on a regular basis by Digital Equipment Corporation's manufacturing organization since January of 1980. Rl has sufficient knowledge of the configuration domain and of the pecularities of the various configuration constraints that at each step in the configuration process, it simply recognizes what to do; thus it requires little search in order to configure a computer system. The approach RI takes to the configuration task and the way its knowledge is represented has been described elsewhere [McDermott 80a, MC Dermott 80b].


The 1998 Simon Newcomb Award

AI Magazine

His proofs are ingenious, cleverly argued, quite convincing to many of his contemporaries, and utterly wrong. The Simon Newcomb Award is given annually for the silliest published argument attacking AI. Our subject may be unique in the virulence and frequency with which it is attacked, both in the popular media and among the cultured intelligentsia. Recent articles have argued that the very idea of AI reflects a cancer in the heart of our culture and have proven (yet again) that it is impossible. While many of these attacks are cited widely, most of them are ridiculous to anyone with an appropriate technical education.


Benjamin J. Kuipers and Tad S. Levitt

AI Magazine

In a large-scale space, structure is at a significantly larger scale than the observations available at an instant To learn the structure of a large-scale space from observations, the observer must build a cognitive map of the environment by integrating observations over an extended period of time, inferring spatial structure from perceptions and the effects of actions The cognitive map representation of largescale space must account for a mapping, or learning structure from observations, and navigation, or creating and executing a plan to travel from one place to another Approaches to date tend to be fragile either because they don't build maps; or because they assume nonlocal observations, such as those available in preexisting maps or global coordinate systems, including active Thus, to learn the large-scale structure of the space, the traveler must necessarily build a cognitive map of the environment by integrating observations over extended periods of time, inferring spatial structure from perceptions and the effects of actions. Large-scale space and the corresponding cognitive map representation cannot be defined independent of sensory perceptions or motor actions used to observe and move about in this environment For example, a work bench observed by a laser-bearing robot is not a large-scale space, but the moon is a large-scale space relative to a land-roving robot. A microchip is not large scale relative to an optical inspection system, but a grasshopper ganglion is a large-scale space when observed by an electron microscope. Inverse trigonometric operations and scalar multiplication require ratio data, in which a numeric value is calibrated with respect to a true zero. Trigonometric operations can require only interval data on angles, where differences are well defined, but absolute angles are not required.


Personalized Electronic Program Guides for Digital TV

AI Magazine

Although today's world offers us unprecedented access to greater and greater amounts of electronic information, we are faced with significant problems when it comes to finding the right information at the right time--the essence of the information-overload problem. One of the proposed solutions to this problem is to develop technologies for automatically learning about the implicit and explicit preferences of individual users to customize and personalize the search for relevant information. For example, modern search engines provide only a first cut through the information space, leaving the user with a significant search task to locate individual information items. This information overload is beginning to cause problems on the internet and is seen as a serious barrier to its future success. This problem takes on even more significance when one considers the new generation of mobile phones, which offer users an alternative internet access route through the wireless application protocol (WAP).


Ontology Translation for Interoperability Among Semantic Web Services

AI Magazine

Research on semantic web services promises greater interoperability among software agents and web services by enabling content-based automated service discovery and interaction and by utilizing. Although this is to be based on use of shared ontologies published on the semantic web, services produced and described by different developers may well use different, perhaps partly overlapping, sets of ontologies. Interoperability will depend on ontology mappings and architectures supporting the associated translation processes. The question we ask is, does the traditional approach of introducing mediator agents to translate messages between requestors and services work in such an open environment? This article reviews some of the processing assumptions that were made in the development of the semantic web service modeling ontology OWLS and argues that, as a practical matter, the translation function cannot always be isolated in mediators.