digest
2025 digest of digests
Throughout the year we've reported on some of the larger stories, and some of the lesser-covered happenings, in our regular monthly digests. We look back through the archives and pick out one or two stories from each of our digests. This month, AI startup DeepSeek released DeepSeek R1, a reasoning model designed for good performance on logic, maths, and pattern-finding tasks. The company has also launched six smaller versions of R1 that are tiny enough to run locally on laptops. In Wired, Zeyi Yang reported on who is behind the startup, whilst Tongliang Liu (in The Conversation) looked at how DeepSeek achieved its results with a fraction of the cash and computing power of its competitors.
- South America > Brazil (0.06)
- North America > United States > Virginia (0.05)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.05)
- (3 more...)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
- Information Technology > Artificial Intelligence > Issues > Social & Ethical Issues (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.91)
- Information Technology > Artificial Intelligence > Natural Language > Chatbot (0.91)
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction
Large Language Models (LLMs) are widely used in today's tasks of natural language processing. To support applications like multi-turn chats, document understanding, and content generation, models with long context lengths are growing in importance.However, managing long contexts brings substantial challenges due to the expansion of key-value cache (KV cache). Longer KV cache requires larger memory, limiting the batch-size thus decreasing throughput. Also, computing attention over long KV cache incurs more memory access, hurting the end-to-end latency.Prior works find that it is sufficient to use only the recent and high-impact tokens for attention computation, allowing the eviction of less vital tokens to shrink cache size.Nonetheless, we observe a dynamic shift in token importance across different decoding steps. Tokens initially evicted might regain importance after certain decoding steps.To address this, we propose ArkVale, a page-based KV cache manager that can recognize and recall currently important tokens evicted before. We asynchronously copy the filled page into external memory (e.g., CPU memory) as backup and summarize it into a much smaller digest by constructing the bounding-volume of its keys. Before attention computation, we measure all pages' importance based on their digests, recall the important ones, evict the unimportant ones, and select the top-ranked pages for attention computation. Experiment results show that ArkVale performs well on various long context tasks with negligible accuracy loss under 2k$\sim$4k cache budget and can improve decoding latency to $2.2\times$ and batching throughput to $4.6\times$ because it applies attention on only a small subset of pages and reduce per-sample memory usage of KV cache.
Comparative Analysis of Hash-based Malware Clustering via K-Means
Thein, Aink Acrie Soe, Pitropakis, Nikolaos, Papadopoulos, Pavlos, Grierson, Sam, Jan, Sana Ullah
With the adoption of multiple digital devices in everyday life, the cyber-attack surface has increased. Adversaries are continuously exploring new avenues to exploit them and deploy malware. On the other hand, detection approaches typically employ hashing-based algorithms such as SSDeep, TLSH, and IMPHash to capture structural and behavioural similarities among binaries. This work focuses on the analysis and evaluation of these techniques for clustering malware samples using the K-means algorithm. More specifically, we experimented with established malware families and traits and found that TLSH and IMPHash produce more distinct, semantically meaningful clusters, whereas SSDeep is more efficient for broader classification tasks. The findings of this work can guide the development of more robust threat-detection mechanisms and adaptive security mechanisms.
- Information Technology > Security & Privacy (1.00)
- Government > Military > Cyberwarfare (0.34)
like ours there are subtleties, and highly appreciate the time and effort that the reviewers are putting in to digest these
We would like to thank the reviewers for their comments and feedback. Janzing et al. [9] write down the same equation, but We will follow the reviewer's The decomposition for conditional SVs follows by replacing "conditioning The decomposition is introduced in Section 3 to assist our illustration of how the different SVs attribute a model's SVs. Unlike conditional (asymmetric) SVs, causal SVs provide the right intuition in the case of common confounding. See also the previous paragraph. SVs appear to fare better than the reviewer suggests.
Evolution of AI Agent Registry Solutions: Centralized, Enterprise, and Distributed Approaches
Singh, Aditi, Ehtesham, Abul, Lambe, Mahesh, Grogan, Jared James, Singh, Abhishek, Kumar, Saket, Muscariello, Luca, Pandey, Vijoy, Marc, Guillaume Sauvage De Saint, Chari, Pradyumna, Raskar, Ramesh
Abstract--Autonomous AI agents now operate across cloud, enterprise, and decentralized domains, creating demand for registry infrastructures that enable trustworthy discovery, capability negotiation, and identity assurance. We analyze five prominent approaches: (1) MCP Registry (centralized publication of mcp.json descriptors), (2) A2A Agent Cards (decentralized self-describing JSON capability manifests), (3) AGNTCY Agent Directory Service (IPFS Kademlia DHT content routing extended for semantic taxonomy-based content discovery, OCI artifact storage, and Sigstore-backed integrity), (4) Microsoft Entra Agent ID (enterprise SaaS directory with policy and zero-trust integration), and (5) NANDA Index AgentFacts (cryptographically verifiable, privacy-preserving fact model with credentialed assertions). Using four evaluation dimensions--security, authentication, scalability, and maintainability--we surface architectural trade-offs between centralized control, enterprise governance, and distributed resilience. We conclude with design recommendations for an emerging Internet of AI Agents requiring verifiable identity, adaptive discovery flows, and interoperable capability semantics. Autonomous AI agents are rapidly becoming foundational across domains from cloud-native assistants and robotics to decentralized systems and edge-based IoT controllers. These agents act independently, make decisions, and collaborate at scale. As agent populations grow into the billions across heterogeneous platforms and administrative boundaries, the ability to identify, discover, and trust agents in real time has emerged as a critical infrastructure challenge. Traditional mechanisms like DNS and static service catalogs are poorly suited to agent ecosystems, which demand dynamic discovery, verifiable metadata, and privacy-preserving interactions [1]. Legacy systems assume fixed endpoints and ownership-based trust models, lacking the flexibility and cryptographic assurances needed for agents that rotate capabilities, change locations, and form ephemeral collaborations. To address these limitations, several agent frameworks have introduced discovery metadata models.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (1.00)
like ours there are subtleties, and highly appreciate the time and effort that the reviewers are putting in to digest these
We would like to thank the reviewers for their comments and feedback. Janzing et al. [9] write down the same equation, but We will follow the reviewer's The decomposition for conditional SVs follows by replacing "conditioning The decomposition is introduced in Section 3 to assist our illustration of how the different SVs attribute a model's SVs. Unlike conditional (asymmetric) SVs, causal SVs provide the right intuition in the case of common confounding. See also the previous paragraph. SVs appear to fare better than the reviewer suggests.
The AGNTCY Agent Directory Service: Architecture and Implementation
Muscariello, Luca, Pandey, Vijoy, Polic, Ramiz
The Agent Directory Service (ADS) is a distributed directory for the discovery of AI agent capabilities, metadata, and provenance. It leverages content-addressed storage, hierarchical taxonomies, and cryptographic signing to enable efficient, verifiable, and multi-dimensional discovery across heterogeneous Multi-Agent Systems (MAS). Built on the Open Agentic Schema Framework (OASF), ADS decouples capability indexing from content location through a two-level mapping realized over a Kademlia-based Distributed Hash Table (DHT). It reuses mature OCI / ORAS infrastructure for artifact distribution, integrates Sigstore for provenance, and supports schema-driven extensibility for emerging agent modalities (LLM prompt agents, MCP servers, A2A-enabled components). This paper formalizes the architectural model, describes storage and discovery layers, explains security and performance properties, and positions ADS within the broader landscape of emerging agent registry and interoperability initiatives.
- Information Technology (0.46)
- Commercial Services & Supplies (0.46)
Pythons can devour bones thanks to unique stomach cells
Breakthroughs, discoveries, and DIY tips sent every weekday. Few predators swallow their prey whole. Even fewer can digest their meals with bones and all. Herpetologists have spent years trying to understand how bones are not only safe and healthy for the serpents, but how their biology manages to regulate when and how many bones to digest. Now, researchers believe they have identified an explanation hidden inside the "crypts" of specialized cells.
ArkVale: Efficient Generative LLM Inference with Recallable Key-Value Eviction
Large Language Models (LLMs) are widely used in today's tasks of natural language processing. To support applications like multi-turn chats, document understanding, and content generation, models with long context lengths are growing in importance.However, managing long contexts brings substantial challenges due to the expansion of key-value cache (KV cache). Longer KV cache requires larger memory, limiting the batch-size thus decreasing throughput. Also, computing attention over long KV cache incurs more memory access, hurting the end-to-end latency.Prior works find that it is sufficient to use only the recent and high-impact tokens for attention computation, allowing the eviction of less vital tokens to shrink cache size.Nonetheless, we observe a dynamic shift in token importance across different decoding steps. Tokens initially evicted might regain importance after certain decoding steps.To address this, we propose ArkVale, a page-based KV cache manager that can recognize and recall currently important tokens evicted before.
LLM$\times$MapReduce-V2: Entropy-Driven Convolutional Test-Time Scaling for Generating Long-Form Articles from Extremely Long Resources
Wang, Haoyu, Fu, Yujia, Zhang, Zhu, Wang, Shuo, Ren, Zirui, Wang, Xiaorong, Li, Zhili, He, Chaoqun, An, Bo, Liu, Zhiyuan, Sun, Maosong
Long-form generation is crucial for a wide range of practical applications, typically categorized into short-to-long and long-to-long generation. While short-to-long generations have received considerable attention, generating long texts from extremely long resources remains relatively underexplored. The primary challenge in long-to-long generation lies in effectively integrating and analyzing relevant information from extensive inputs, which remains difficult for current large language models (LLMs). In this paper, we propose LLM$\times$MapReduce-V2, a novel test-time scaling strategy designed to enhance the ability of LLMs to process extremely long inputs. Drawing inspiration from convolutional neural networks, which iteratively integrate local features into higher-level global representations, LLM$\times$MapReduce-V2 utilizes stacked convolutional scaling layers to progressively expand the understanding of input materials. Both quantitative and qualitative experimental results demonstrate that our approach substantially enhances the ability of LLMs to process long inputs and generate coherent, informative long-form articles, outperforming several representative baselines. Both LLM$\times$MapReduce-V2 and SurveyEval are publicly available at https://github.com/thunlp/LLMxMapReduce .