AITopics | weaver

Collaborating Authors

weaver

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

WEAVER: Shrinking the Generation-Verification Gap with Weak Verifiers

Neural Information Processing SystemsJun-18-2026, 10:06:12 GMT

Verifiers can improve language model (LM) capabilities by providing feedback or selecting the best response from a pool of generated candidates. Currently, high-quality verifiers are either unscalable (e.g., humans) or limited in utility (e.g., tools like Lean for formal proofs). While LM judges and reward models have become broadly useful as general-purpose verifiers, a significant performance gap remains between them and oracle verifiers. To help close this gap, we introduce WEAVER, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers. First we find that weighted ensembles of verifiers, which typically require learning from labeled data, significantly outperform unweighted combinations due to differences in the verifiers. To reduce the dependency on labeled data, WEAVER leverages weak supervision to estimate each verifier's accuracy and combines their outputs into a unified score that better reflects true response quality.

large language model, machine learning, natural language, (21 more...)

Neural Information Processing Systems

Country: North America > United States (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (1.00)

Industry: Information Technology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
(5 more...)

Add feedback

Weaver: Shrinking the Generation-Verification Gap by Scaling Compute for Verification

Neural Information Processing SystemsJun-12-2026, 18:30:17 GMT

Verifiers can improve language model (LM) capabilities by providing feedback or selecting the best response from a pool of generated candidates. Currently, high-quality verifiers are either unscalable (e.g., humans) or limited in utility (e.g., tools like Lean for formal proofs). While LM judges and reward models have become broadly useful as general-purpose verifiers, a significant performance gap remains between them and oracle verifiers. To help close this gap, we introduce Weaver, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers. First we find that weighted ensembles of verifiers, which typically require learning from labeled data, significantly outperform unweighted combinations due to differences in the verifiers. To reduce the dependency on labeled data, Weaver leverages weak supervision to estimate each verifier's accuracy and combines their outputs into a unified score that better reflects true response quality.

artificial intelligence, machine learning, natural language, (8 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.96)
Information Technology > Artificial Intelligence > Natural Language (0.58)

Add feedback

Shrinking the Generation-Verification Gap with Weak Verifiers

Saad-Falcon, Jon, Buchanan, E. Kelly, Chen, Mayee F., Huang, Tzu-Heng, McLaughlin, Brendan, Bhathal, Tanvir, Zhu, Shang, Athiwaratkun, Ben, Sala, Frederic, Linderman, Scott, Mirhoseini, Azalia, Ré, Christopher

arXiv.org Artificial IntelligenceDec-10-2025

Verifiers can improve language model capabilities by scoring and ranking responses from generated candidates. Currently, high-quality verifiers are either unscalable (e.g., humans) or limited in utility (e.g., tools like Lean). While LM judges and reward models have become broadly useful as general-purpose verifiers, a significant performance gap remains between them and oracle verifiers (verifiers with perfect accuracy). To help close this gap, we introduce Weaver, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers. We find weighted ensembles of verifiers, which typically require learning from labeled data, significantly outperform unweighted combinations due to differences in verifier accuracies. To reduce dependency on labeled data, Weaver leverages weak supervision to estimate each verifier's accuracy and combines outputs into a unified score that better reflects true response quality. However, directly applying weak supervision algorithms poses challenges, including inconsistent verifier output formats and handling low-quality verifiers. Weaver addresses these using dataset statistics to normalize outputs and filter specific verifiers. We study Weaver's effectiveness in test-time repeated sampling, where a model generates multiple candidate responses and selects one. Our evaluations show Weaver significantly improves over Pass@1-performance when selecting the first candidate-across reasoning and math tasks, achieving o3-mini-level accuracy with Llama 3.3 70B Instruct as generator, and an ensemble of 70B or smaller judge and reward models as verifiers (87.7% average). This gain mirrors the jump between GPT-4o and o3-mini (69.0% vs. 86.7%), which required extensive finetuning and post-training. To reduce computational costs of verifier ensembles, we train a 400M cross-encoder using Weaver's combined output scores.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2506.18203

Country: North America > United States (0.67)

Genre: Research Report > New Finding (0.68)

Industry: Information Technology (0.93)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
(4 more...)

Add feedback

Weaver: Kronecker Product Approximations of Spatiotemporal Attention for Traffic Network Forecasting

Cheong, Christopher, Davis, Gary, Choi, Seongjin

arXiv.org Artificial IntelligenceDec-1-2025

Spatiotemporal forecasting on transportation networks is a complex task that requires understanding how traffic nodes interact within a dynamic, evolving system dictated by traffic flow dynamics and social behavioral patterns. The importance of transportation networks and ITS for modern mobility and commerce necessitates forecasting models that are not only accurate but also interpretable, efficient, and robust under structural or temporal perturbations. Recent approaches, particularly Transformer-based architectures, have improved predictive performance but often at the cost of high computational overhead and diminished architectural interpretability. In this work, we introduce Weaver, a novel attention-based model that applies Kronecker product approximations (KPA) to decompose the PN X PN spatiotemporal attention of O(P^2N^2) complexity into local P X P temporal and N X N spatial attention maps. This Kronecker attention map enables our Parallel-Kronecker Matrix-Vector product (P2-KMV) for efficient spatiotemporal message passing with O(P^2N + N^2P) complexity. To capture real-world traffic dynamics, we address the importance of negative edges in modeling traffic behavior by introducing Valence Attention using the continuous Tanimoto coefficient (CTC), which provides properties conducive to precise latent graph generation and training stability. To fully utilize the model's learning capacity, we introduce the Traffic Phase Dictionary for self-conditioning. Evaluations on PEMS-BAY and METR-LA show that Weaver achieves competitive performance across model categories while training more efficiently.

data mining, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2511.08888

Country:

Europe (0.92)
North America > United States > California (0.67)
North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)

Genre:

Overview (0.92)
Research Report > New Finding (0.45)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (1.00)
Information Technology (0.93)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
(2 more...)

Add feedback

MemGen: Weaving Generative Latent Memory for Self-Evolving Agents

Zhang, Guibin, Fu, Muxin, Yan, Shuicheng

arXiv.org Artificial IntelligenceOct-14-2025

Agent memory shapes how Large Language Model (LLM)-powered agents, akin to the human brain, progressively refine themselves through environment interactions. Existing paradigms remain constrained: parametric memory forcibly adjusts model parameters, and retrieval-based memory externalizes experience into structured databases, yet neither captures the fluid interweaving of reasoning and memory that underlies human cognition. To address this gap, we propose MemGen, a dynamic generative memory framework that equips agents with a human-esque cognitive faculty. It consists of a \textit{memory trigger}, which monitors the agent's reasoning state to decide explicit memory invocation, and a \textit{memory weaver}, which takes the agent's current state as stimulus to construct a latent token sequence as machine-native memory to enrich its reasoning. In this way, MemGen enables agents to recall and augment latent memory throughout reasoning, producing a tightly interwoven cycle of memory and cognition. Extensive experiments across eight benchmarks show that MemGen surpasses leading external memory systems such as ExpeL and AWM by up to $38.22\%$, exceeds GRPO by up to $13.44\%$, and exhibits strong cross-domain generalization ability. More importantly, we find that without explicit supervision, MemGen spontaneously evolves distinct human-like memory faculties, including planning memory, procedural memory, and working memory, suggesting an emergent trajectory toward more naturalistic forms of machine cognition.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2509.24704

Country: North America > Canada (0.28)

Genre: Research Report > New Finding (0.67)

Industry: Health & Medicine (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)

Add feedback

Weaver: Interweaving SQL and LLM for Table Reasoning

Khoja, Rohit, Gupta, Devanshu, Fu, Yanjie, Roth, Dan, Gupta, Vivek

arXiv.org Artificial IntelligenceSep-25-2025

Querying tables with unstructured data is challenging due to the presence of text (or image), either embedded in the table or in external paragraphs, which traditional SQL struggles to process, especially for tasks requiring semantic reasoning. While Large Language Models (LLMs) excel at understanding context, they face limitations with long input sequences. Existing approaches that combine SQL and LLMs typically rely on rigid, predefined work-flows, limiting their adaptability to complex queries. To address these issues, we introduce Weaver , a modular pipeline that dynamically integrates SQL and LLMs for table-based question answering (TableQA). Weaver generates a flexible, step-by-step plan that combines SQL for structured data retrieval with LLMs for semantic processing. By decomposing complex queries into manageable subtasks, Weaver improves accuracy and generalization. Our experiments show that Weaver consistently outperforms state-of-the-art methods across four TableQA datasets, reducing both API calls and error rates. The code, along with other associated scripts, are available at https://coral-lab-asu.github.io/weaver.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2505.18961

Country:

North America > United States (1.00)
Asia (1.00)
Europe (0.93)

Genre:

Workflow (1.00)
Research Report > New Finding (0.67)

Industry: Leisure & Entertainment > Sports (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Spacer: Towards Engineered Scientific Inspiration

Lee, Minhyeong, Hwang, Suyoung, Moon, Seunghyun, Nah, Geonho, Koh, Donghyun, Cho, Youngjun, Park, Johyun, Yoo, Hojin, Park, Jiho, Choi, Haneul, Moon, Sungbin, Hwang, Taehoon, Kim, Seungwon, Kim, Jaeyeong, Kim, Seongjun, Jung, Juneau

arXiv.org Artificial IntelligenceAug-27-2025

Recent advances in LLMs have made automated scientific research the next frontline in the path to artificial superintelligence. However, these systems are bound either to tasks of narrow scope or the limited creative capabilities of LLMs. We propose Spacer, a scientific discovery system that develops creative and factually grounded concepts without external intervention. Spacer attempts to achieve this via 'deliberate decontextualization,' an approach that disassembles information into atomic units - keywords - and draws creativity from unexplored connections between them. Spacer consists of (i) Nuri, an inspiration engine that builds keyword sets, and (ii) the Manifesting Pipeline that refines these sets into elaborate scientific statements. Nuri extracts novel, high-potential keyword sets from a keyword graph built with 180,000 academic publications in biological fields. The Manifesting Pipeline finds links between keywords, analyzes their logical structure, validates their plausibility, and ultimately drafts original scientific concepts. According to our experiments, the evaluation metric of Nuri accurately classifies high-impact publications with an AUROC score of 0.737. Our Manifesting Pipeline also successfully reconstructs core concepts from the latest top-journal articles solely from their keyword sets. An LLM-based scoring system estimates that this reconstruction was sound for over 85% of the cases. Finally, our embedding space analysis shows that outputs from Spacer are significantly more similar to leading publications compared with those from SOTA LLMs.

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2508.17661

Country: North America > United States (0.28)

Genre:

Research Report > Promising Solution (0.93)
Research Report > Experimental Study (0.67)

Industry:

Health & Medicine > Therapeutic Area > Oncology (1.00)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

How to Survive the A.I. Revolution

The New YorkerApr-14-2025, 10:00:00 GMT

In the early hours of April 12, 1812, a crowd of men approached Rawfolds Mill, a four-story stone building on the banks of the River Spen, in West Yorkshire. This was Brontë country--a landscape of bleak moors, steep valleys, and small towns nestled in the hollows. The men, who'd assembled on the moors hours earlier, were armed with muskets, sticks, hatchets, and heavy blacksmith's hammers. When they reached the mill, those at the front broke windows to gain entry, and some fired shots into the darkened factory. But the mill's owner, William Cartwright, had been preparing for trouble.

cartwright, luddite, mill owner, (15 more...)

The New Yorker

Country:

Europe > United Kingdom > England > West Yorkshire (0.25)
Europe > United Kingdom > England > Nottinghamshire (0.05)

Industry:

Law (0.71)
Government (0.48)
Banking & Finance (0.48)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.30)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.30)

Add feedback

South of Midnight review – beautiful surfaces can't hide thin gameplay

The GuardianApr-4-2025, 13:35:14 GMT

Soaring development costs; protracted production cycles; cautious C-suites looking to deliver reliable returns for shareholders: for many reasons, there is a dearth of original programming in big-budget video games. Already this year we have seen the arrival of the seventh mainline Civilization game, the 14th entry in the Assassin's Creed franchise, and, most brain-melting of all, the 27th Monster Hunter title. But look: here's a magical-realist tale set in a moody, hurricane-ravaged imagining of the American deep south, whose title, crucially, bears no numerical suffix. South of Midnight makes a brilliantly atmospheric first impression. Winds bludgeon flimsy abodes; rain lashes down on tin roofs; the world is rendered with the macabre and crooked details of a Tim Burton film.

compulsion game, hide thin gameplay, video game, (1 more...)

The Guardian

Industry: Leisure & Entertainment > Games > Computer Games (0.38)

Technology: Information Technology > Artificial Intelligence > Games (0.58)

Add feedback

Weaver: Foundation Models for Creative Writing

Wang, Tiannan, Chen, Jiamin, Jia, Qingrui, Wang, Shuai, Fang, Ruoyu, Wang, Huilin, Gao, Zhaowei, Xie, Chunzhao, Xu, Chuou, Dai, Jihong, Liu, Yibin, Wu, Jialong, Ding, Shengwei, Li, Long, Huang, Zhiwei, Deng, Xinle, Yu, Teng, Ma, Gangan, Xiao, Han, Chen, Zixin, Xiang, Danjun, Wang, Yunxia, Zhu, Yuanyuan, Xiao, Yi, Wang, Jing, Wang, Yiru, Ding, Siran, Huang, Jiayang, Xu, Jiayi, Tayier, Yilihamu, Hu, Zhenyu, Gao, Yuan, Zheng, Chengfeng, Ye, Yueshu, Li, Yihang, Wan, Lei, Jiang, Xinyue, Wang, Yujie, Cheng, Siyu, Song, Zhule, Tang, Xiangru, Xu, Xiaohua, Zhang, Ningyu, Chen, Huajun, Jiang, Yuchen Eleanor, Zhou, Wangchunshu

arXiv.org Artificial IntelligenceJan-30-2024

This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for instruction data synthesis and LLM alignment, making it able to produce more human-like texts and follow more diverse instructions for content creation. The Weaver family consists of models of Weaver Mini (1.8B), Weaver Base (6B), Weaver Pro (14B), and Weaver Ultra (34B) sizes, suitable for different applications and can be dynamically dispatched by a routing agent according to query complexity to balance response quality and computation cost. Evaluation on a carefully curated benchmark for assessing the writing capabilities of LLMs shows Weaver models of all sizes outperform generalist LLMs several times larger than them. Notably, our most-capable Weaver Ultra model surpasses GPT-4, a state-of-the-art generalist LLM, on various writing scenarios, demonstrating the advantage of training specialized LLMs for writing purposes. Moreover, Weaver natively supports retrieval-augmented generation (RAG) and function calling (tool usage). We present various use cases of these abilities for improving AI-assisted writing systems, including integration of external knowledge bases, tools, or APIs, and providing personalized writing assistance. Furthermore, we discuss and summarize a guideline and best practices for pre-training and fine-tuning domain-specific LLMs.

instruction, llm, weaver, (14 more...)

arXiv.org Artificial Intelligence

2401.17268

Country:

North America > Canada > Ontario > Toronto (0.04)
North America > United States > Maryland > Baltimore (0.04)
Europe > Italy > Tuscany > Florence (0.04)
(2 more...)

Genre: Research Report > Promising Solution (0.48)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback