description
An AI-Powered Framework for Analyzing Collective Idea Evolution in Deliberative Assemblies
Poole-Dayan, Elinor, Roy, Deb, Kabbara, Jad
In an era of increasing societal fragmentation, political polarization, and erosion of public trust in institutions, representative deliberative assemblies are emerging as a promising democratic forum for developing effective policy outcomes on complex global issues. Despite theoretical attention, there remains limited empirical work that systematically traces how specific ideas evolve, are prioritized, or are discarded during deliberation to form policy recommendations. Addressing these gaps, this work poses two central questions: (1) How might we trace the evolution and distillation of ideas into concrete recommendations within deliberative assemblies? (2) How does the deliberative process shape delegate perspectives and influence voting dynamics over the course of the assembly? To address these questions, we develop LLM-based methodologies for empirically analyzing transcripts from a tech-enhanced in-person deliberative assembly. The framework identifies and visualizes the space of expressed suggestions. We also empirically reconstruct each delegate's evolving perspective throughout the assembly. Our methods contribute novel empirical insights into deliberative processes and demonstrate how LLMs can surface high-resolution dynamics otherwise invisible in traditional assembly outputs.
- Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.04)
- South America > Argentina (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
- Education > Educational Setting (0.67)
- Government (0.66)
- Energy > Renewable (0.46)
- Water & Waste Management > Solid Waste Management (0.46)
V2X-LLM: Enhancing V2X Integration and Understanding in Connected Vehicle Corridors
Wu, Keshu, Li, Pei, Zhou, Yang, Gan, Rui, You, Junwei, Cheng, Yang, Zhu, Jingwen, Parker, Steven T., Ran, Bin, Noyce, David A., Tu, Zhengzhong
The advancement of Connected and Automated Vehicles (CAVs) and Vehicle-to-Everything (V2X) offers significant potential for enhancing transportation safety, mobility, and sustainability. However, the integration and analysis of the diverse and voluminous V2X data, including Basic Safety Messages (BSMs) and Signal Phase and Timing (SPaT) data, present substantial challenges, especially on Connected Vehicle Corridors. These challenges include managing large data volumes, ensuring real-time data integration, and understanding complex traffic scenarios. Although these projects have developed an advanced CAV data pipeline that enables real-time communication between vehicles, infrastructure, and other road users for managing connected vehicle and roadside unit (RSU) data, significant hurdles in data comprehension and real-time scenario analysis and reasoning persist. To address these issues, we introduce the V2X-LLM framework, a novel enhancement to the existing CV data pipeline. V2X-LLM leverages Large Language Models (LLMs) to improve the understanding and real-time analysis of V2X data. The framework includes four key tasks: Scenario Explanation, offering detailed narratives of traffic conditions; V2X Data Description, detailing vehicle and infrastructure statuses; State Prediction, forecasting future traffic states; and Navigation Advisory, providing optimized routing instructions. By integrating LLM-driven reasoning with V2X data within the data pipeline, the V2X-LLM framework offers real-time feedback and decision support for traffic management. This integration enhances the accuracy of traffic analysis, safety, and traffic optimization. Demonstrations in a real-world urban corridor highlight the framework's potential to advance intelligent transportation systems.
- Asia > China (0.46)
- North America > United States > Wisconsin > Dane County > Madison (0.15)
- North America > United States > Texas > Brazos County > College Station (0.14)
- (2 more...)
- Transportation > Ground > Road (1.00)
- Consumer Products & Services > Travel (0.68)
- Transportation > Infrastructure & Services (0.68)
Contextualizing biological perturbation experiments through language
Wu, Menghua, Littman, Russell, Levine, Jacob, Qiu, Lin, Biancalani, Tommaso, Richmond, David, Huetter, Jan-Christian
High-content perturbation experiments allow scientists to probe biomolecular systems at unprecedented resolution, but experimental and analysis costs pose significant barriers to widespread adoption. Machine learning has the potential to guide efficient exploration of the perturbation space and extract novel insights from these data. However, current approaches neglect the semantic richness of the relevant biology, and their objectives are misaligned with downstream biological analyses. In this paper, we hypothesize that large language models (LLMs) present a natural medium for representing complex biological relationships and rationalizing experimental outcomes. We propose PerturbQA, a benchmark for structured reasoning over perturbation experiments. Unlike current benchmarks that primarily interrogate existing knowledge, PerturbQA is inspired by open problems in perturbation modeling: prediction of differential expression and change of direction for unseen perturbations, and gene set enrichment. We evaluate state-of-the-art machine learning and statistical approaches for modeling perturbations, as well as standard LLM reasoning strategies, and we find that current methods perform poorly on PerturbQA. As a proof of feasibility, we introduce Summer (SUMMarize, retrievE, and answeR, a simple, domain-informed LLM framework that matches or exceeds the current state-of-the-art. Our code and data are publicly available at https://github.com/genentech/PerturbQA.
- Asia (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Mateo County (0.14)
- (3 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.68)
GenTool: Enhancing Tool Generalization in Language Models through Zero-to-One and Weak-to-Strong Simulation
He, Jie, Neville, Jennifer, Wan, Mengting, Yang, Longqi, Liu, Hui, Xu, Xiaofeng, Song, Xia, Pan, Jeff Z., Zhou, Pei
Large Language Models (LLMs) can enhance their capabilities as AI assistants by integrating external tools, allowing them to access a wider range of information. While recent LLMs are typically fine-tuned with tool usage examples during supervised fine-tuning (SFT), questions remain about their ability to develop robust tool-usage skills and can effectively generalize to unseen queries and tools. In this work, we present GenTool, a novel training framework that prepares LLMs for diverse generalization challenges in tool utilization. Our approach addresses two fundamental dimensions critical for real-world applications: Zero-to-One Generalization, enabling the model to address queries initially lacking a suitable tool by adopting and utilizing one when it becomes available, and Weak-to-Strong Generalization, allowing models to leverage enhanced versions of existing tools to solve queries. To achieve this, we develop synthetic training data simulating these two dimensions of tool usage and introduce a two-stage fine-tuning approach: optimizing tool ranking, then refining tool selection. Through extensive experiments across four generalization scenarios, we demonstrate that our method significantly enhances the tool-usage capabilities of LLMs ranging from 1B to 8B parameters, achieving performance that surpasses GPT-4o. Furthermore, our analysis also provides valuable insights into the challenges LLMs encounter in tool generalization.
- North America > United States (0.14)
- Asia > Thailand (0.14)
- North America > Mexico > Mexico City (0.14)
- North America > Canada (0.14)
- Banking & Finance (0.68)
- Transportation > Passenger (0.67)
- Consumer Products & Services > Travel (0.46)
- Transportation > Ground (0.46)
REALM-Bench: A Real-World Planning Benchmark for LLMs and Multi-Agent Systems
Geng, Longling, Chang, Edward Y.
This benchmark suite provides a comprehensive evaluation framework for assessing both individual LLMs and multi-agent systems in real-world planning scenarios. The suite encompasses eleven designed problems that progress from basic to highly complex, incorporating key aspects such as multi-agent coordination, inter-agent dependencies, and dynamic environmental disruptions. Each problem can be scaled along three dimensions: the number of parallel planning threads, the complexity of inter-dependencies, and the frequency of unexpected disruptions requiring real-time adaptation. The benchmark includes detailed specifications, evaluation metrics, and baseline implementations using contemporary frameworks like LangGraph, enabling rigorous testing of both single-agent and multi-agent planning capabilities. Through standardized evaluation criteria and scalable complexity, this benchmark aims to drive progress in developing more robust and adaptable AI planning systems for real-world applications.
- Pacific Ocean > North Pacific Ocean > San Francisco Bay > Golden Gate (0.05)
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Asia > Middle East > Republic of Türkiye (0.04)
- (3 more...)
- Workflow (0.47)
- Research Report (0.41)
- Transportation (1.00)
- Information Technology (1.00)
- Consumer Products & Services > Travel (1.00)
- (3 more...)
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
Yanambakkam, Hemanth Teja, Chinthala, Rahul
Image captioning is a challenging task at the intersection of computer vision and natural language processing, requiring models to generate meaningful textual descriptions of images. Traditional approaches rely on recurrent neural networks (RNNs), but recent advancements in attention mechanisms have demonstrated significant improvements. This study benchmarks the performance of attention-based image captioning models against RNN-based approaches using the MS-COCO dataset. We evaluate the effectiveness of Bahdanau attention in enhancing the alignment between image features and generated captions. The models are assessed using natural language processing metrics such as BLEU, METEOR, GLEU, and WER. Our results show that attention-based models outperform RNNs in generating more accurate and semantically rich captions, with better alignment to human evaluation. This work provides insights into the impact of attention mechanisms in image captioning and highlights areas for future improvements.
An LLM-Based Approach for Insight Generation in Data Analysis
Pérez, Alberto Sánchez, Boukhary, Alaa, Papotti, Paolo, Lozano, Luis Castejón, Elwood, Adam
Generating insightful and actionable information from databases is critical in data analysis. This paper introduces a novel approach using Large Language Models (LLMs) to automatically generate textual insights. Given a multi-table database as input, our method leverages LLMs to produce concise, text-based insights that reflect interesting patterns in the tables. Our framework includes a Hypothesis Generator to formulate domain-relevant questions, a Query Agent to answer such questions by generating SQL queries against a database, and a Summarization module to verbalize the insights. The insights are evaluated for both correctness and subjective insightfulness using a hybrid model of human judgment and automated metrics. Experimental results on public and enterprise databases demonstrate that our approach generates more insightful insights than other approaches while maintaining correctness.
- Europe (0.28)
- North America > United States > California (0.14)
- Asia (0.14)
- Overview (1.00)
- Research Report > Promising Solution (0.34)
- Leisure & Entertainment > Sports > Soccer (1.00)
- Education (1.00)
Completeness of Datasets Documentation on ML/AI repositories: an Empirical Investigation
Rondina, Marco, Vetrò, Antonio, De Martin, Juan Carlos
ML/AI is the field of computer science and computer engineering that arguably received the most attention and funding over the last decade. Data is the key element of ML/AI, so it is becoming increasingly important to ensure that users are fully aware of the quality of the datasets that they use, and of the process generating them, so that possible negative impacts on downstream effects can be tracked, analysed, and, where possible, mitigated. One of the tools that can be useful in this perspective is dataset documentation. The aim of this work is to investigate the state of dataset documentation practices, measuring the completeness of the documentation of several popular datasets in ML/AI repositories. We created a dataset documentation schema -- the Documentation Test Sheet (DTS) -- that identifies the information that should always be attached to a dataset (to ensure proper dataset choice and informed use), according to relevant studies in the literature. We verified 100 popular datasets from four different repositories with the DTS to investigate which information was present. Overall, we observed a lack of relevant documentation, especially about the context of data collection and data processing, highlighting a paucity of transparency.
- North America > United States (0.15)
- Europe > Italy (0.14)
- Research Report > New Finding (0.93)
- Research Report > Experimental Study (0.68)
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Li, Chengshu, Liang, Jacky, Zeng, Andy, Chen, Xinyun, Hausman, Karol, Sadigh, Dorsa, Levine, Sergey, Fei-Fei, Li, Xia, Fei, Ichter, Brian
Code provides a general syntactic structure to build complex programs and perform precise computations when paired with a code interpreter - we hypothesize that language models (LMs) can leverage code-writing to improve Chain of Thought reasoning not only for logic and arithmetic tasks, but also for semantic ones (and in particular, those that are a mix of both). For example, consider prompting an LM to write code that counts the number of times it detects sarcasm in an essay: the LM may struggle to write an implementation for "detect_sarcasm(string)" that can be executed by the interpreter (handling the edge cases would be insurmountable). However, LMs may still produce a valid solution if they not only write code, but also selectively "emulate" the interpreter by generating the expected output of "detect_sarcasm(string)" and other lines of code that cannot be executed. In this work, we propose Chain of Code (CoC), a simple yet surprisingly effective extension that improves LM code-driven reasoning. The key idea is to encourage LMs to format semantic sub-tasks in a program as flexible pseudocode that the interpreter can explicitly catch undefined behaviors and hand off to simulate with an LM (as an "LMulator"). Experiments demonstrate that Chain of Code outperforms Chain of Thought and other baselines across a variety of benchmarks; on BIG-Bench Hard, Chain of Code achieves 84%, a gain of 12% over Chain of Thought. CoC scales well with large and small models alike, and broadens the scope of reasoning questions that LMs can correctly answer by "thinking in code". Project webpage: https://chain-of-code.github.io.
- North America > The Bahamas (0.14)
- North America > Barbados (0.14)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
- (65 more...)
Data Engineer, Computer Vision at Canopy - London, England, United Kingdom
We're all in - are you? Canopy is a brand new company with a unique mission, solving for one of the biggest and growing challenges vehicle owners face – the threat of theft. A start-up with a compelling proposition, patented cutting edge AI technology, and a unique layer of expert monitoring from security specialists, we're backed by Ford and ADT. Together, we're determined to help vehicle owners stay one step ahead of potential threats by warning them before they happen. Our next step is to take our service to market and write the next big security technology success story.