Goto

Collaborating Authors

 mac


Online Adaptation of Language Models with a Memory of Amortized Contexts

Neural Information Processing Systems

Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. To address the crucial need to keep models updated, online learning has emerged as a critical tool when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of modern LLMs, efficient adaptation is essential. To address these challenges, we propose Memory of Amortized Contexts (MAC), an efficient and effective online adaptation framework for LLMs with strong knowledge retention. We propose a feature extraction and memory-augmentation approach to compress and extract information from new documents into compact modulations stored in a memory bank.


drawing connections to Feldman's work (L36), but we agree that the relation between the three topics should be

Neural Information Processing Systems

Thank you all for your thoughtful comments; we address your concerns below. The MDL principle formalizes Occam's razor and is a We will add the discussion of such relevant studies to section 1. We will add these results and accompanying visualizations to appendix. Model (solver) MAC DAFT MAC (euler) DAFT MAC (rk4) DAFT MAC (dopri5; used in training)Time (ms) 153. We found that during evaluation, rk4 solves all the dynamics generated from CLEVR dataset.


Practical and Performant Enhancements for Maximization of Algebraic Connectivity

Jung, Leonard, Papalia, Alan, Doherty, Kevin, Everett, Michael

arXiv.org Artificial Intelligence

Abstract-- Long-term state estimation over graphs remains challenging as current graph estimation methods scale poorly on large, long-term graphs. T o address this, our work advances a current state-of-the-art graph sparsification algorithm, maximizing algebraic connectivity (MAC). MAC is a sparsification method that preserves estimation performance by maximizing the algebraic connectivity, a spectral graph property that is directly connected to the estimation error . Unfortunately, MAC remains computationally prohibitive for online use and requires users to manually pre-specify a connectivity-preserving edge set. Our contributions close these gaps along three complementary fronts: we develop a specialized solver for algebraic connectivity that yields an average 2x runtime speedup; we investigate advanced step size strategies for MAC's optimization procedure to enhance both convergence speed and solution quality; and we propose automatic schemes that guarantee graph connectivity without requiring manual specification of edges. T ogether, these contributions make MAC more scalable, reliable, and suitable for real-time estimation applications. The scalability of state estimation and perception remains a critical challenge for long-term autonomous robotic systems.


Attention Consistency for LLMs Explanation

Lan, Tian, Xu, Jinyuan, He, Xue, Hwang, Jenq-Neng, Li, Lei

arXiv.org Artificial Intelligence

Understanding the decision-making processes of large language models (LLMs) is essential for their trustworthy development and deployment. However, current interpretability methods often face challenges such as low resolution and high computational cost. To address these limitations, we propose the \textbf{Multi-Layer Attention Consistency Score (MACS)}, a novel, lightweight, and easily deployable heuristic for estimating the importance of input tokens in decoder-based models. MACS measures contributions of input tokens based on the consistency of maximal attention. Empirical evaluations demonstrate that MACS achieves a favorable trade-off between interpretability quality and computational efficiency, showing faithfulness comparable to complex techniques with a 22\% decrease in VRAM usage and 30\% reduction in latency.


drawing connections to Feldman's work (L36), but we agree that the relation between the three topics should be

Neural Information Processing Systems

Thank you all for your thoughtful comments; we address your concerns below. The MDL principle formalizes Occam's razor and is a We will add the discussion of such relevant studies to section 1. We will add these results and accompanying visualizations to appendix. Model (solver) MAC DAFT MAC (euler) DAFT MAC (rk4) DAFT MAC (dopri5; used in training)Time (ms) 153. We found that during evaluation, rk4 solves all the dynamics generated from CLEVR dataset.


Online Adaptation of Language Models with a Memory of Amortized Contexts

Neural Information Processing Systems

Due to the rapid generation and dissemination of information, large language models (LLMs) quickly run out of date despite enormous development costs. To address the crucial need to keep models updated, online learning has emerged as a critical tool when utilizing LLMs for real-world applications. However, given the ever-expanding corpus of unseen documents and the large parameter space of modern LLMs, efficient adaptation is essential. To address these challenges, we propose Memory of Amortized Contexts (MAC), an efficient and effective online adaptation framework for LLMs with strong knowledge retention. We propose a feature extraction and memory-augmentation approach to compress and extract information from new documents into compact modulations stored in a memory bank.


How to free up space on your Mac

FOX News

Apple has made it easier than ever to merge duplicate photos. Are you tired of scrolling through your Mac's photo library only to find multiple copies of the same photo? Duplicate photos can clutter your storage and make it harder to find the memories you want to cherish. Fortunately, if you're using macOS Ventura or later, Apple has made it easier than ever to find and merge these duplicates right within the Photos app. We'll walk you through how to use the built-in Duplicates finder, as well as some alternative methods for those who need more advanced features.


Exploring the Performance Improvement of Tensor Processing Engines through Transformation in the Bit-weight Dimension of MACs

Wu, Qizhe, Liang, Huawen, Gui, Yuchen, Zeng, Zhichen, He, Zerong, Tao, Linfeng, Wang, Xiaotian, Zhao, Letian, Zeng, Zhaoxi, Yuan, Wei, Wu, Wei, Jin, Xi

arXiv.org Artificial Intelligence

General matrix-matrix multiplication (GEMM) is a cornerstone of AI computations, making tensor processing engines (TPEs) increasingly critical in GPUs and domain-specific architectures. Existing architectures primarily optimize dataflow or operand reuse strategies. However, considering the interaction between matrix multiplication and multiply-accumulators (MACs) offers greater optimization potential. This work introduces a novel hardware perspective on matrix multiplication, focusing on the bit-weight dimension of MACs. We propose a finer-grained TPE notation using matrix triple loops as an example, introducing new methods for designing and optimizing PE microarchitectures. Based on this notation and its transformations, we propose four optimization techniques that improve timing, area, and power consumption. Implementing our design in RTL using the SMIC-28nm process, we evaluate its effectiveness across four classic TPE architectures: systolic array, 3D-Cube, multiplier-adder tree, and 2D-Matrix. Our techniques achieve area efficiency improvements of 1.27x, 1.28x, 1.56x, and 1.44x, and energy efficiency gains of 1.04x, 1.56x, 1.49x, and 1.20x, respectively. Applied to a bit-slice architecture, our approach achieves a 12.10x improvement in energy efficiency and 2.85x in area efficiency compared to Laconic. Our Verilog HDL code, along with timing, area, and power reports, is available at https://github.com/wqzustc/High-Performance-Tensor-Processing-Engines


The life-changing benefits of Apple's Personal Voice and Live Speech

FOX News

Create a synthesized voice that sounds just like you. Imagine losing the ability to speak and communicate with your loved ones. What if you could preserve your unique voice and continue expressing yourself, even when speaking becomes challenging? Apple's Personal Voice and Live Speech features are groundbreaking accessibility tools designed to do exactly that. These innovative technologies allow you to create a synthesized voice that sounds just like you, giving individuals at risk of losing their speech a powerful way to maintain their personal communication style.


Microsoft quietly activates feature that lets AI scrape your personal info

Daily Mail - Science & tech

Microsoft has quietly rolled out an AI feature that automatically accesses your data in Word and Excel documents. The company introduced Connected Experiences on all of its Microsoft 365 apps last month. It's turned on by default if you have Windows X and above, and requires users to manually disable the feature to turn it off. Connected Experiences analyzes user's content to provide design recommendations, grammar and editing suggestions while also offering relevant links to more information. The data includes information like user's search history, app usage and location to personalize their results when Connected Experiences is active.