weaver
Shrinking the Generation-Verification Gap with Weak Verifiers
Saad-Falcon, Jon, Buchanan, E. Kelly, Chen, Mayee F., Huang, Tzu-Heng, McLaughlin, Brendan, Bhathal, Tanvir, Zhu, Shang, Athiwaratkun, Ben, Sala, Frederic, Linderman, Scott, Mirhoseini, Azalia, Ré, Christopher
Verifiers can improve language model capabilities by scoring and ranking responses from generated candidates. Currently, high-quality verifiers are either unscalable (e.g., humans) or limited in utility (e.g., tools like Lean). While LM judges and reward models have become broadly useful as general-purpose verifiers, a significant performance gap remains between them and oracle verifiers (verifiers with perfect accuracy). To help close this gap, we introduce Weaver, a framework for designing a strong verifier by combining multiple weak, imperfect verifiers. We find weighted ensembles of verifiers, which typically require learning from labeled data, significantly outperform unweighted combinations due to differences in verifier accuracies. To reduce dependency on labeled data, Weaver leverages weak supervision to estimate each verifier's accuracy and combines outputs into a unified score that better reflects true response quality. However, directly applying weak supervision algorithms poses challenges, including inconsistent verifier output formats and handling low-quality verifiers. Weaver addresses these using dataset statistics to normalize outputs and filter specific verifiers. We study Weaver's effectiveness in test-time repeated sampling, where a model generates multiple candidate responses and selects one. Our evaluations show Weaver significantly improves over Pass@1-performance when selecting the first candidate-across reasoning and math tasks, achieving o3-mini-level accuracy with Llama 3.3 70B Instruct as generator, and an ensemble of 70B or smaller judge and reward models as verifiers (87.7% average). This gain mirrors the jump between GPT-4o and o3-mini (69.0% vs. 86.7%), which required extensive finetuning and post-training. To reduce computational costs of verifier ensembles, we train a 400M cross-encoder using Weaver's combined output scores.
- Asia > Middle East > Jordan (0.04)
- North America > United States > Wisconsin > Dane County > Madison (0.04)
- North America > United States > Virginia (0.04)
- (3 more...)
Weaver: Kronecker Product Approximations of Spatiotemporal Attention for Traffic Network Forecasting
Cheong, Christopher, Davis, Gary, Choi, Seongjin
Spatiotemporal forecasting on transportation networks is a complex task that requires understanding how traffic nodes interact within a dynamic, evolving system dictated by traffic flow dynamics and social behavioral patterns. The importance of transportation networks and ITS for modern mobility and commerce necessitates forecasting models that are not only accurate but also interpretable, efficient, and robust under structural or temporal perturbations. Recent approaches, particularly Transformer-based architectures, have improved predictive performance but often at the cost of high computational overhead and diminished architectural interpretability. In this work, we introduce Weaver, a novel attention-based model that applies Kronecker product approximations (KPA) to decompose the PN X PN spatiotemporal attention of O(P^2N^2) complexity into local P X P temporal and N X N spatial attention maps. This Kronecker attention map enables our Parallel-Kronecker Matrix-Vector product (P2-KMV) for efficient spatiotemporal message passing with O(P^2N + N^2P) complexity. To capture real-world traffic dynamics, we address the importance of negative edges in modeling traffic behavior by introducing Valence Attention using the continuous Tanimoto coefficient (CTC), which provides properties conducive to precise latent graph generation and training stability. To fully utilize the model's learning capacity, we introduce the Traffic Phase Dictionary for self-conditioning. Evaluations on PEMS-BAY and METR-LA show that Weaver achieves competitive performance across model categories while training more efficiently.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.28)
- North America > United States > District of Columbia > Washington (0.14)
- Africa > Senegal > Kolda Region > Kolda (0.04)
- (13 more...)
- Overview (0.92)
- Research Report > New Finding (0.45)
- Transportation > Infrastructure & Services (1.00)
- Transportation > Ground > Road (1.00)
- Information Technology (0.92)
MemGen: Weaving Generative Latent Memory for Self-Evolving Agents
Zhang, Guibin, Fu, Muxin, Yan, Shuicheng
Agent memory shapes how Large Language Model (LLM)-powered agents, akin to the human brain, progressively refine themselves through environment interactions. Existing paradigms remain constrained: parametric memory forcibly adjusts model parameters, and retrieval-based memory externalizes experience into structured databases, yet neither captures the fluid interweaving of reasoning and memory that underlies human cognition. To address this gap, we propose MemGen, a dynamic generative memory framework that equips agents with a human-esque cognitive faculty. It consists of a \textit{memory trigger}, which monitors the agent's reasoning state to decide explicit memory invocation, and a \textit{memory weaver}, which takes the agent's current state as stimulus to construct a latent token sequence as machine-native memory to enrich its reasoning. In this way, MemGen enables agents to recall and augment latent memory throughout reasoning, producing a tightly interwoven cycle of memory and cognition. Extensive experiments across eight benchmarks show that MemGen surpasses leading external memory systems such as ExpeL and AWM by up to $38.22\%$, exceeds GRPO by up to $13.44\%$, and exhibits strong cross-domain generalization ability. More importantly, we find that without explicit supervision, MemGen spontaneously evolves distinct human-like memory faculties, including planning memory, procedural memory, and working memory, suggesting an emergent trajectory toward more naturalistic forms of machine cognition.
- North America > United States (0.14)
- Africa > Tanzania (0.04)
- Asia > China > Beijing > Beijing (0.04)
- (5 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
- Information Technology > Artificial Intelligence > Cognitive Science (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.90)
Weaver: Interweaving SQL and LLM for Table Reasoning
Khoja, Rohit, Gupta, Devanshu, Fu, Yanjie, Roth, Dan, Gupta, Vivek
Querying tables with unstructured data is challenging due to the presence of text (or image), either embedded in the table or in external paragraphs, which traditional SQL struggles to process, especially for tasks requiring semantic reasoning. While Large Language Models (LLMs) excel at understanding context, they face limitations with long input sequences. Existing approaches that combine SQL and LLMs typically rely on rigid, predefined work-flows, limiting their adaptability to complex queries. To address these issues, we introduce Weaver , a modular pipeline that dynamically integrates SQL and LLMs for table-based question answering (TableQA). Weaver generates a flexible, step-by-step plan that combines SQL for structured data retrieval with LLMs for semantic processing. By decomposing complex queries into manageable subtasks, Weaver improves accuracy and generalization. Our experiments show that Weaver consistently outperforms state-of-the-art methods across four TableQA datasets, reducing both API calls and error rates. The code, along with other associated scripts, are available at https://coral-lab-asu.github.io/weaver.
- Asia > Russia (0.14)
- North America > United States > New York (0.04)
- Europe > Italy (0.04)
- (19 more...)
- Workflow (1.00)
- Research Report > New Finding (0.67)
Spacer: Towards Engineered Scientific Inspiration
Lee, Minhyeong, Hwang, Suyoung, Moon, Seunghyun, Nah, Geonho, Koh, Donghyun, Cho, Youngjun, Park, Johyun, Yoo, Hojin, Park, Jiho, Choi, Haneul, Moon, Sungbin, Hwang, Taehoon, Kim, Seungwon, Kim, Jaeyeong, Kim, Seongjun, Jung, Juneau
Recent advances in LLMs have made automated scientific research the next frontline in the path to artificial superintelligence. However, these systems are bound either to tasks of narrow scope or the limited creative capabilities of LLMs. We propose Spacer, a scientific discovery system that develops creative and factually grounded concepts without external intervention. Spacer attempts to achieve this via 'deliberate decontextualization,' an approach that disassembles information into atomic units - keywords - and draws creativity from unexplored connections between them. Spacer consists of (i) Nuri, an inspiration engine that builds keyword sets, and (ii) the Manifesting Pipeline that refines these sets into elaborate scientific statements. Nuri extracts novel, high-potential keyword sets from a keyword graph built with 180,000 academic publications in biological fields. The Manifesting Pipeline finds links between keywords, analyzes their logical structure, validates their plausibility, and ultimately drafts original scientific concepts. According to our experiments, the evaluation metric of Nuri accurately classifies high-impact publications with an AUROC score of 0.737. Our Manifesting Pipeline also successfully reconstructs core concepts from the latest top-journal articles solely from their keyword sets. An LLM-based scoring system estimates that this reconstruction was sound for over 85% of the cases. Finally, our embedding space analysis shows that outputs from Spacer are significantly more similar to leading publications compared with those from SOTA LLMs.
- Europe (0.04)
- North America > United States > Virginia > Alexandria County > Alexandria (0.04)
- Asia > China (0.04)
- Research Report > Promising Solution (0.93)
- Research Report > Experimental Study (0.67)
How to Survive the A.I. Revolution
In the early hours of April 12, 1812, a crowd of men approached Rawfolds Mill, a four-story stone building on the banks of the River Spen, in West Yorkshire. This was Brontë country--a landscape of bleak moors, steep valleys, and small towns nestled in the hollows. The men, who'd assembled on the moors hours earlier, were armed with muskets, sticks, hatchets, and heavy blacksmith's hammers. When they reached the mill, those at the front broke windows to gain entry, and some fired shots into the darkened factory. But the mill's owner, William Cartwright, had been preparing for trouble.
- Law (0.71)
- Government (0.48)
- Banking & Finance (0.48)
South of Midnight review – beautiful surfaces can't hide thin gameplay
Soaring development costs; protracted production cycles; cautious C-suites looking to deliver reliable returns for shareholders: for many reasons, there is a dearth of original programming in big-budget video games. Already this year we have seen the arrival of the seventh mainline Civilization game, the 14th entry in the Assassin's Creed franchise, and, most brain-melting of all, the 27th Monster Hunter title. But look: here's a magical-realist tale set in a moody, hurricane-ravaged imagining of the American deep south, whose title, crucially, bears no numerical suffix. South of Midnight makes a brilliantly atmospheric first impression. Winds bludgeon flimsy abodes; rain lashes down on tin roofs; the world is rendered with the macabre and crooked details of a Tim Burton film.
Weaver: Foundation Models for Creative Writing
Wang, Tiannan, Chen, Jiamin, Jia, Qingrui, Wang, Shuai, Fang, Ruoyu, Wang, Huilin, Gao, Zhaowei, Xie, Chunzhao, Xu, Chuou, Dai, Jihong, Liu, Yibin, Wu, Jialong, Ding, Shengwei, Li, Long, Huang, Zhiwei, Deng, Xinle, Yu, Teng, Ma, Gangan, Xiao, Han, Chen, Zixin, Xiang, Danjun, Wang, Yunxia, Zhu, Yuanyuan, Xiao, Yi, Wang, Jing, Wang, Yiru, Ding, Siran, Huang, Jiayang, Xu, Jiayi, Tayier, Yilihamu, Hu, Zhenyu, Gao, Yuan, Zheng, Chengfeng, Ye, Yueshu, Li, Yihang, Wan, Lei, Jiang, Xinyue, Wang, Yujie, Cheng, Siyu, Song, Zhule, Tang, Xiangru, Xu, Xiaohua, Zhang, Ningyu, Chen, Huajun, Jiang, Yuchen Eleanor, Zhou, Wangchunshu
This work introduces Weaver, our first family of large language models (LLMs) dedicated to content creation. Weaver is pre-trained on a carefully selected corpus that focuses on improving the writing capabilities of large language models. We then fine-tune Weaver for creative and professional writing purposes and align it to the preference of professional writers using a suit of novel methods for instruction data synthesis and LLM alignment, making it able to produce more human-like texts and follow more diverse instructions for content creation. The Weaver family consists of models of Weaver Mini (1.8B), Weaver Base (6B), Weaver Pro (14B), and Weaver Ultra (34B) sizes, suitable for different applications and can be dynamically dispatched by a routing agent according to query complexity to balance response quality and computation cost. Evaluation on a carefully curated benchmark for assessing the writing capabilities of LLMs shows Weaver models of all sizes outperform generalist LLMs several times larger than them. Notably, our most-capable Weaver Ultra model surpasses GPT-4, a state-of-the-art generalist LLM, on various writing scenarios, demonstrating the advantage of training specialized LLMs for writing purposes. Moreover, Weaver natively supports retrieval-augmented generation (RAG) and function calling (tool usage). We present various use cases of these abilities for improving AI-assisted writing systems, including integration of external knowledge bases, tools, or APIs, and providing personalized writing assistance. Furthermore, we discuss and summarize a guideline and best practices for pre-training and fine-tuning domain-specific LLMs.
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- (2 more...)
Weaving Pathways for Justice with GPT: LLM-driven automated drafting of interactive legal applications
Steenhuis, Quinten, Colarusso, David, Willey, Bryce
Can generative AI help us speed up the authoring of tools to help self-represented litigants? In this paper, we describe 3 approaches to automating the completion of court forms: a generative AI approach that uses GPT-3 to iteratively prompt the user to answer questions, a constrained template-driven approach that uses GPT-4-turbo to generate a draft of questions that are subject to human review, and a hybrid method. We use the open source Docassemble platform in all 3 experiments, together with a tool created at Suffolk University Law School called the Assembly Line Weaver. We conclude that the hybrid model of constrained automated drafting with human review is best suited to the task of authoring guided interviews.
- North America > United States > Massachusetts (0.05)
- North America > United States > New York > New York County > New York City (0.04)
- North America > Canada > Quebec > Montreal (0.04)
- Europe > Netherlands > North Holland > Amsterdam (0.04)
- Law (1.00)
- Education > Educational Setting > Higher Education (0.55)
- Education > Curriculum > Subject-Specific Education (0.55)
BERT WEAVER: Using WEight AVERaging to enable lifelong learning for transformer-based models in biomedical semantic search engines
Kühnel, Lisa, Schulz, Alexander, Hammer, Barbara, Fluck, Juliane
Recent developments in transfer learning have boosted the advancements in natural language processing tasks. The performance is, however, dependent on high-quality, manually annotated training data. Especially in the biomedical domain, it has been shown that one training corpus is not enough to learn generic models that are able to efficiently predict on new data. Therefore, in order to be used in real world applications state-of-the-art models need the ability of lifelong learning to improve performance as soon as new data are available - without the need of re-training the whole model from scratch. We present WEAVER, a simple, yet efficient post-processing method that infuses old knowledge into the new model, thereby reducing catastrophic forgetting. We show that applying WEAVER in a sequential manner results in similar word embedding distributions as doing a combined training on all data at once, while being computationally more efficient. Because there is no need of data sharing, the presented method is also easily applicable to federated learning settings and can for example be beneficial for the mining of electronic health records from different clinics.
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Germany > North Rhine-Westphalia > Cologne Region > Cologne (0.04)
- North America > United States > New Mexico > Santa Fe County > Santa Fe (0.04)
- (6 more...)
- Instructional Material (1.00)
- Research Report > Promising Solution (0.48)
- Education > Educational Setting > Continuing Education (0.71)
- Health & Medicine > Health Care Technology > Medical Record (0.68)