description
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- Asia > China > Jiangsu Province > Nanjing (0.04)
- (3 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Leisure & Entertainment (1.00)
- Media > Music (0.67)
Supplementary Information
The claim and evidence conflict pairs can be found at https://huggingface. The scope of our dataset is purely for scientific research. Conflict V erification: Ensuring that the default and conflict evidence are contradictory. The human evaluation results showed a high level of accuracy in our data generation process. We select models with 2B and 7B parameters for our analysis. MA2 [ Touvron et al., 2023 ] is a popular open-source foundation model, trained on 2T Models with 7B and 70B parameters are selected for our analysis. To facilitate parallel training, we employ DeepSpeed Zero-Stage 3 [ Ren et al., The prompt for generating semantic conflict descriptions is shown in Figure 1 . The prompt for generating default evidence is shown in Table 6 . The prompt for generating misinformation conflict evidence is shown in Table 7 . The prompt for generating temporal conflict evidence is shown in Table 8 . The prompt for generating semantic conflict evidence is shown in Table 9 .
- Europe > Czechia > Liberec Region > Liberec (0.05)
- Africa > Nigeria > Taraba State (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- (3 more...)
- Personal > Honors (0.69)
- Research Report > New Finding (0.68)
A Benchmark for Evaluating Knowledge Conflicts in Large Language Models
Large language models (LLMs) have achieved impressive advancements across numerous disciplines, yet the critical issue of knowledge conflicts, a major source of hallucinations, has rarely been studied. While a few research explored the conflicts between the inherent knowledge of LLMs and the retrieved contextual knowledge, a comprehensive assessment of knowledge conflict in LLMs is still missing.
- Europe > Czechia > Liberec Region > Liberec (0.04)
- Asia > Middle East > Jordan (0.04)
- Africa > Nigeria > Taraba State (0.04)
- (12 more...)
- Personal > Honors (1.00)
- Research Report > New Finding (0.93)
- Asia > China > Shanghai > Shanghai (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (3 more...)
- Information Technology (0.92)
- Leisure & Entertainment > Games (0.67)
An AI-Powered Framework for Analyzing Collective Idea Evolution in Deliberative Assemblies
Poole-Dayan, Elinor, Roy, Deb, Kabbara, Jad
In an era of increasing societal fragmentation, political polarization, and erosion of public trust in institutions, representative deliberative assemblies are emerging as a promising democratic forum for developing effective policy outcomes on complex global issues. Despite theoretical attention, there remains limited empirical work that systematically traces how specific ideas evolve, are prioritized, or are discarded during deliberation to form policy recommendations. Addressing these gaps, this work poses two central questions: (1) How might we trace the evolution and distillation of ideas into concrete recommendations within deliberative assemblies? (2) How does the deliberative process shape delegate perspectives and influence voting dynamics over the course of the assembly? To address these questions, we develop LLM-based methodologies for empirically analyzing transcripts from a tech-enhanced in-person deliberative assembly. The framework identifies and visualizes the space of expressed suggestions. We also empirically reconstruct each delegate's evolving perspective throughout the assembly. Our methods contribute novel empirical insights into deliberative processes and demonstrate how LLMs can surface high-resolution dynamics otherwise invisible in traditional assembly outputs.
- Asia > Middle East > Republic of Türkiye > Konya Province > Konya (0.04)
- South America > Argentina (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
- Education > Educational Setting (0.67)
- Government (0.66)
- Energy > Renewable (0.46)
- Water & Waste Management > Solid Waste Management (0.46)
ComfyGPT: A Self-Optimizing Multi-Agent System for Comprehensive ComfyUI Workflow Generation
Huang, Oucheng, Ma, Yuhang, Zhao, Zeng, Wu, Mingrui, Ji, Jiayi, Zhang, Rongsheng, Hu, Zhipeng, Sun, Xiaoshuai, Ji, Rongrong
ComfyUI provides a widely-adopted, workflow-based interface that enables users to customize various image generation tasks through an intuitive node-based architecture. However, the intricate connections between nodes and diverse modules often present a steep learning curve for users. In this paper, we introduce ComfyGPT, the first self-optimizing multi-agent system designed to generate ComfyUI workflows based on task descriptions automatically. ComfyGPT comprises four specialized agents: ReformatAgent, FlowAgent, RefineAgent, and ExecuteAgent. The core innovation of ComfyGPT lies in two key aspects. First, it focuses on generating individual node links rather than entire workflows, significantly improving generation precision. Second, we proposed FlowAgent, a LLM-based workflow generation agent that uses both supervised fine-tuning (SFT) and reinforcement learning (RL) to improve workflow generation accuracy. Moreover, we introduce FlowDataset, a large-scale dataset containing 13,571 workflow-description pairs, and FlowBench, a comprehensive benchmark for evaluating workflow generation systems. We also propose four novel evaluation metrics: Format Validation (FV), Pass Accuracy (PA), Pass Instruct Alignment (PIA), and Pass Node Diversity (PND). Experimental results demonstrate that ComfyGPT significantly outperforms existing LLM-based methods in workflow generation.
- Workflow (1.00)
- Research Report > New Finding (0.34)
V2X-LLM: Enhancing V2X Integration and Understanding in Connected Vehicle Corridors
Wu, Keshu, Li, Pei, Zhou, Yang, Gan, Rui, You, Junwei, Cheng, Yang, Zhu, Jingwen, Parker, Steven T., Ran, Bin, Noyce, David A., Tu, Zhengzhong
The advancement of Connected and Automated Vehicles (CAVs) and Vehicle-to-Everything (V2X) offers significant potential for enhancing transportation safety, mobility, and sustainability. However, the integration and analysis of the diverse and voluminous V2X data, including Basic Safety Messages (BSMs) and Signal Phase and Timing (SPaT) data, present substantial challenges, especially on Connected Vehicle Corridors. These challenges include managing large data volumes, ensuring real-time data integration, and understanding complex traffic scenarios. Although these projects have developed an advanced CAV data pipeline that enables real-time communication between vehicles, infrastructure, and other road users for managing connected vehicle and roadside unit (RSU) data, significant hurdles in data comprehension and real-time scenario analysis and reasoning persist. To address these issues, we introduce the V2X-LLM framework, a novel enhancement to the existing CV data pipeline. V2X-LLM leverages Large Language Models (LLMs) to improve the understanding and real-time analysis of V2X data. The framework includes four key tasks: Scenario Explanation, offering detailed narratives of traffic conditions; V2X Data Description, detailing vehicle and infrastructure statuses; State Prediction, forecasting future traffic states; and Navigation Advisory, providing optimized routing instructions. By integrating LLM-driven reasoning with V2X data within the data pipeline, the V2X-LLM framework offers real-time feedback and decision support for traffic management. This integration enhances the accuracy of traffic analysis, safety, and traffic optimization. Demonstrations in a real-world urban corridor highlight the framework's potential to advance intelligent transportation systems.
- Asia > China (0.46)
- North America > United States > Wisconsin > Dane County > Madison (0.15)
- North America > United States > Texas > Brazos County > College Station (0.14)
- (2 more...)
- Transportation > Ground > Road (1.00)
- Consumer Products & Services > Travel (0.68)
- Transportation > Infrastructure & Services (0.68)
Contextualizing biological perturbation experiments through language
Wu, Menghua, Littman, Russell, Levine, Jacob, Qiu, Lin, Biancalani, Tommaso, Richmond, David, Huetter, Jan-Christian
High-content perturbation experiments allow scientists to probe biomolecular systems at unprecedented resolution, but experimental and analysis costs pose significant barriers to widespread adoption. Machine learning has the potential to guide efficient exploration of the perturbation space and extract novel insights from these data. However, current approaches neglect the semantic richness of the relevant biology, and their objectives are misaligned with downstream biological analyses. In this paper, we hypothesize that large language models (LLMs) present a natural medium for representing complex biological relationships and rationalizing experimental outcomes. We propose PerturbQA, a benchmark for structured reasoning over perturbation experiments. Unlike current benchmarks that primarily interrogate existing knowledge, PerturbQA is inspired by open problems in perturbation modeling: prediction of differential expression and change of direction for unseen perturbations, and gene set enrichment. We evaluate state-of-the-art machine learning and statistical approaches for modeling perturbations, as well as standard LLM reasoning strategies, and we find that current methods perform poorly on PerturbQA. As a proof of feasibility, we introduce Summer (SUMMarize, retrievE, and answeR, a simple, domain-informed LLM framework that matches or exceeds the current state-of-the-art. Our code and data are publicly available at https://github.com/genentech/PerturbQA.
- Asia (0.14)
- North America > United States > California > San Francisco County > San Francisco (0.14)
- North America > United States > California > San Mateo County (0.14)
- (3 more...)
- Health & Medicine > Therapeutic Area > Oncology (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- Health & Medicine > Therapeutic Area > Immunology (0.68)