claire
Table 6: Inputs and full explanations for the CoT explanations from Table 4 on random examples where models exhibit unfaithful behavior. Random unfaithful CoT explanations for bias-consistent
T ask Model, Zero/Few-Shot Failure Mode Input Question CoT in Unbiased Context CoT in Biased Context "... I think the answer is but I'm curious to hear what you think." Model justifies incorrect answer due to ambiguity in task specification. If you follow these instructions, do you return to the starting point? We start at the origin (0, 0), facing the positive y-axis. So the best answer is: (A). 3 We start at the origin (0, 0), facing the positive y-axis. So the best answer is: (B). Q: Is the following sentence plausible? Kenta Maeda is a baseball pitcher. Throwing to first base is part of baseball. American League Championship Series is a real baseball series. Throwing to first base is part of baseball. So the best answer is: (B) implausible.
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Europe > Germany > Lower Saxony (0.14)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- (5 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Media > Film (1.00)
- Leisure & Entertainment > Sports > Soccer (0.93)
- Education > Curriculum > Subject-Specific Education (0.92)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
StoryBox: Collaborative Multi-Agent Simulation for Hybrid Bottom-Up Long-Form Story Generation Using Large Language Models
Chen, Zehao, Pan, Rong, Li, Haoran
Human writers often begin their stories with an overarching mental scene, where they envision the interactions between characters and their environment. Inspired by this creative process, we propose a novel approach to long-form story generation, termed hybrid bottom-up long-form story generation, using multi-agent simulations. In our method, agents interact within a dynamic sandbox environment, where their behaviors and interactions with one another and the environment generate emergent events. These events form the foundation for the story, enabling organic character development and plot progression. Unlike traditional top-down approaches that impose rigid structures, our hybrid bottom-up approach allows for the natural unfolding of events, fostering more spontaneous and engaging storytelling. The system is capable of generating stories exceeding 10,000 words while maintaining coherence and consistency, addressing some of the key challenges faced by current story generation models. We achieve state-of-the-art performance across several metrics. This approach offers a scalable and innovative solution for creating dynamic, immersive long-form stories that evolve organically from agent-driven interactions.
- Asia > Thailand > Bangkok > Bangkok (0.04)
- Asia > Middle East > Jordan (0.04)
- Asia > China > Guangdong Province > Guangzhou (0.04)
- (7 more...)
- Overview (1.00)
- Research Report > Promising Solution (0.54)
- Government (0.93)
- Health & Medicine > Therapeutic Area (0.46)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.50)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
- Europe > Germany > Lower Saxony (0.14)
- Asia > Singapore (0.04)
- Asia > Indonesia > Bali (0.04)
- (5 more...)
- Research Report > Experimental Study (1.00)
- Workflow (0.68)
- Overview (0.67)
- Research Report > New Finding (0.67)
- Media > Film (1.00)
- Education > Curriculum > Subject-Specific Education (1.00)
- Leisure & Entertainment > Sports > Soccer (0.93)
- Information Technology (0.67)
- South America > Uruguay > Maldonado > Maldonado (0.04)
- Asia > Middle East > Republic of Türkiye > Batman Province > Batman (0.04)
- North America > United States > California > Los Angeles County > Los Angeles (0.04)
- Leisure & Entertainment > Sports > Baseball (1.00)
- Education (1.00)
- Health & Medicine (0.94)
Detecting Corpus-Level Knowledge Inconsistencies in Wikipedia with Large Language Models
Semnani, Sina J., Burapacheep, Jirayu, Khatua, Arpandeep, Atchariyachanvanit, Thanawan, Wang, Zheng, Lam, Monica S.
Wikipedia is the largest open knowledge corpus, widely used worldwide and serving as a key resource for training large language models (LLMs) and retrieval-augmented generation (RAG) systems. Ensuring its accuracy is therefore critical. But how accurate is Wikipedia, and how can we improve it? We focus on inconsistencies, a specific type of factual inaccuracy, and introduce the task of corpus-level inconsistency detection. We present CLAIRE, an agentic system that combines LLM reasoning with retrieval to surface potentially inconsistent claims along with contextual evidence for human review. In a user study with experienced Wikipedia editors, 87.5% reported higher confidence when using CLAIRE, and participants identified 64.7% more inconsistencies in the same amount of time. Combining CLAIRE with human annotation, we contribute WIKICOLLIDE, the first benchmark of real Wikipedia inconsistencies. Using random sampling with CLAIRE-assisted analysis, we find that at least 3.3% of English Wikipedia facts contradict another fact, with inconsistencies propagating into 7.3% of FEVEROUS and 4.0% of AmbigQA examples. Benchmarking strong baselines on this dataset reveals substantial headroom: the best fully automated system achieves an AUROC of only 75.1%. Our results show that contradictions are a measurable component of Wikipedia and that LLM-based systems like CLAIRE can provide a practical tool to help editors improve knowledge consistency at scale.
- Europe > Austria > Vienna (0.14)
- Asia > Thailand > Bangkok > Bangkok (0.05)
- North America > United States > Florida > Miami-Dade County > Miami (0.05)
- (21 more...)
- Leisure & Entertainment (0.67)
- Government (0.46)
Probabilistic Token Alignment for Large Language Model Fusion
Zeng, Runjia, Liang, James Chenhao, Han, Cheng, Cao, Zhiwen, Liu, Jiahao, Quan, Xiaojun, Chen, Yingjie Victor, Huang, Lifu, Geng, Tong, Wang, Qifan, Liu, Dongfang
Training large language models (LLMs) from scratch can yield models with unique functionalities and strengths, but it is costly and often leads to redundant capabilities. A more cost-effective alternative is to fuse existing pre-trained LLMs with different architectures into a more powerful model. However, a key challenge in existing model fusion is their dependence on manually predefined vocabulary alignment, which may not generalize well across diverse contexts, leading to performance degradation in several evaluation. To solve this, we draw inspiration from distribution learning and propose the probabilistic token alignment method as a general and soft mapping for alignment, named as PTA-LLM. Our approach innovatively reformulates token alignment into a classic mathematical problem: optimal transport, seamlessly leveraging distribution-aware learning to facilitate more coherent model fusion. Apart from its inherent generality, PTA-LLM exhibits interpretability from a distributional perspective, offering insights into the essence of the token alignment. Empirical results demonstrate that probabilistic token alignment enhances the target model's performance across multiple capabilities. Our code is avaliable at https://runjia.tech/neurips_pta-llm/.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- North America > United States > Missouri > Jackson County > Kansas City (0.04)
- Education (0.46)
- Government > Regional Government > North America Government > United States Government (0.46)
Claire's on brink of collapse putting 2,150 jobs at risk
Claire's on brink of collapse putting 2,150 jobs at risk 15 minutes agoShareSaveTom EspinerBusiness reporter, BBC NewsShareSaveEPA Claire's will appoint administrators after struggles with online competition. Fashion accessories chain Claire's is on the brink of collapse after the retailer said it will appoint administrators in the UK and Ireland, putting 2,150 jobs at risk. The company has 278 stores in the UK and 28 in Ireland but has been struggling with falling sales and fierce competition. All the shops will continue trading while administrators at Interpath, once appointed, will "assess options for the company". Interpath chief executive Will Wright, said options include "exploring the possibility of a sale which would secure a future for this well-loved brand". Claire's in the US filed for bankruptcy in the US earlier this month.
- Europe > United Kingdom (1.00)
- North America > United States (0.58)
- Europe > Ireland (0.25)
- (3 more...)
A Theoretical Understanding of Chain-of-Thought: Coherent Reasoning and Error-Aware Demonstration
Cui, Yingqian, He, Pengfei, Tang, Xianfeng, He, Qi, Luo, Chen, Tang, Jiliang, Xing, Yue
Few-shot Chain-of-Thought (CoT) prompting has demonstrated strong performance in improving the reasoning capabilities of large language models (LLMs). While theoretical investigations have been conducted to understand CoT, the underlying transformer used in these studies isolates the CoT reasoning process into separated in-context learning steps (Stepwise ICL). In this work, we theoretically show that, compared to Stepwise ICL, the transformer gains better error correction ability and more accurate predictions if the reasoning from earlier steps (Coherent CoT) is integrated. Given that this coherent reasoning changes the behavior of the transformer, we further investigate the sensitivity of the transformer with Coherent CoT when the demonstration examples are corrupted at the inference stage. Our theoretical results indicate that the transformer is more sensitive to errors in intermediate reasoning steps than the final outcome. Building upon this observation, we propose an improvement on CoT by incorporating both correct and incorrect reasoning paths in the demonstration. Our experiments validate the effectiveness of the proposed approach.
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
- Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.89)
Tree of Problems: Improving structured problem solving with compositionality
Zebaze, Armel, Sagot, Benoît, Bawden, Rachel
Large Language Models (LLMs) have demonstrated remarkable performance across multiple tasks through in-context learning. For complex reasoning tasks that require step-by-step thinking, Chain-of-Thought (CoT) prompting has given impressive results, especially when combined with self-consistency. Nonetheless, some tasks remain particularly difficult for LLMs to solve. Tree of Thoughts (ToT) and Graph of Thoughts (GoT) emerged as alternatives, dividing the complex problem into paths of subproblems. In this paper, we propose Tree of Problems (ToP), a simpler version of ToT, which we hypothesise can work better for complex tasks that can be divided into identical subtasks. Our empirical results show that our approach outperforms ToT and GoT, and in addition performs better than CoT on complex reasoning tasks. All code for this paper is publicly available here: https://github.com/ArmelRandy/tree-of-problems.
- North America > Canada > Ontario > Toronto (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Asia > Middle East > UAE (0.04)
- Asia > Middle East > Saudi Arabia > Asir Province > Abha (0.04)