AITopics | lsm-tree

Collaborating Authors

lsm-tree

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Learned LSM-trees: Two Approaches Using Learned Bloom Filters

Fidalgo, Nicholas, Ye, Puyuan

arXiv.org Artificial IntelligenceAug-5-2025

Modern key-value stores rely heavily on Log-Structured Merge (LSM) trees for write optimization, but this design introduces significant read amplification. Auxiliary structures like Bloom filters help, but impose memory costs that scale with tree depth and dataset size. Recent advances in learned data structures suggest that machine learning models can augment or replace these components, trading handcrafted heuristics for data-adaptive behavior. In this work, we explore two approaches for integrating learned predictions into the LSM-tree lookup path. The first uses a classifier to selectively bypass Bloom filter probes for irrelevant levels, aiming to reduce average-case query latency. The second replaces traditional Bloom filters with compact learned models and small backup filters, targeting memory footprint reduction without compromising correctness. We implement both methods atop a Monkey-style LSM-tree with leveled compaction, per-level Bloom filters, and realistic workloads. Our experiments show that the classifier reduces GET latency by up to 2.28x by skipping over 30% of Bloom filter checks with high precision, though it incurs a modest false-negative rate. The learned Bloom filter design achieves zero false negatives and retains baseline latency while cutting memory usage per level by 70-80%. Together, these designs illustrate complementary trade-offs between latency, memory, and correctness, and highlight the potential of learned index components in write-optimized storage systems.

artificial intelligence, bloom filter, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2508.00882

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)

Add feedback

CAMAL: Optimizing LSM-trees via Active Learning

Yu, Weiping, Luo, Siqiang, Yu, Zihao, Cong, Gao

arXiv.org Artificial IntelligenceSep-23-2024

We use machine learning to optimize LSM-tree structure, aiming to reduce the cost of processing various read/write operations. We introduce a new approach Camal, which boasts the following features: (1) ML-Aided: Camal is the first attempt to apply active learning to tune LSM-tree based key-value stores. The learning process is coupled with traditional cost models to improve the training process; (2) Decoupled Active Learning: backed by rigorous analysis, Camal adopts active learning paradigm based on a decoupled tuning of each parameter, which further accelerates the learning process; (3) Easy Extrapolation: Camal adopts an effective mechanism to incrementally update the model with the growth of the data size; (4) Dynamic Mode: Camal is able to tune LSM-tree online under dynamically changing workloads; (5) Significant System Improvement: By integrating Camal into a full system RocksDB, the system performance improves by 28% on average and up to 8x compared to a state-of-the-art RocksDB design.

lsm-tree, proceedings, workload, (14 more...)

arXiv.org Artificial Intelligence

2409.1513

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.05)
North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Information Technology (0.67)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.93)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)

Add feedback

DumpKV: Learning based lifetime aware garbage collection for key value separation in LSM-tree

Zhuang, Zhutao, Zeng, Xinqi, Chen, Zhiguang

arXiv.org Artificial IntelligenceJun-3-2024

Key\-value separation is used in LSM\-tree to stored large value in separate log files to reduce write amplification, but requires garbage collection to garbage collect invalid values. Existing garbage collection techniques in LSM\-tree typically adopt static parameter based garbage collection to garbage collect obsolete values which struggles to achieve low write amplification and it's challenging to find proper parameter for garbage collection triggering. In this work we introduce DumpKV, which introduces learning based lifetime aware garbage collection with dynamic lifetime adjustment to do efficient garbage collection to achieve lower write amplification. DumpKV manages large values using trained lightweight model with features suitable for various application based on past write access information of keys to give lifetime prediction for each individual key to enable efficient garbage collection. To reduce interference to write throughput DumpKV conducts feature collection during L0\-L1 compaction leveraging the fact that LSM\-tree is small under KV separation. Experimental results show that DumpKV achieves lower write amplification by 38\%\-73\% compared to existing key\-value separation garbage collection LSM\-tree stores with small feature storage overhead.

garbage collection, lsm-tree, value file, (14 more...)

arXiv.org Artificial Intelligence

2406.0125

Country:

North America > United States > California > Santa Clara County > Santa Clara (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.14)
North America > United States > New York > New York County > New York City (0.05)
(17 more...)

Genre: Research Report > New Finding (0.34)

Industry: Water & Waste Management > Solid Waste Management (1.00)

Technology:

Information Technology > Information Management (0.67)
Information Technology > Data Science (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.46)

Add feedback

Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic Workloads

Mo, Dingheng, Chen, Fanchao, Luo, Siqiang, Shan, Caihua

arXiv.org Artificial IntelligenceSep-17-2023

LSM-trees are widely adopted as the storage backend of key-value stores. However, optimizing the system performance under dynamic workloads has not been sufficiently studied or evaluated in previous work. To fill the gap, we present RusKey, a key-value store with the following new features: (1) RusKey is a first attempt to orchestrate LSM-tree structures online to enable robust performance under the context of dynamic workloads; (2) RusKey is the first study to use Reinforcement Learning (RL) to guide LSM-tree transformations; (3) RusKey includes a new LSM-tree design, named FLSM-tree, for an efficient transition between different compaction policies -- the bottleneck of dynamic key-value stores. We justify the superiority of the new design with theoretical analysis; (4) RusKey requires no prior workload knowledge for system adjustment, in contrast to state-of-the-art techniques. Experiments show that RusKey exhibits strong performance robustness in diverse workloads, achieving up to 4x better end-to-end performance than the RocksDB system under various settings.

compaction policy, transition, workload, (15 more...)

arXiv.org Artificial Intelligence

2308.07013

Country:

Europe > Austria > Vienna (0.14)
Asia > Singapore (0.04)
North America > United States > New York > New York County > New York City (0.04)
(3 more...)

Genre: Research Report > New Finding (1.00)

Industry: Information Technology > Services (0.67)

Technology:

Information Technology > Information Management (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Algorithms Behind Modern Storage Systems

Communications of the ACMJul-23-2018, 20:35:47 GMT

Developing storage systems always presents the same challenges and factors to consider. Deciding what to optimize for has a substantial influence on the result. You can spend more time during write in order to lay out structures for more efficient reads, reserve extra space for in-place updates, facilitate faster writes, and buffer data in memory to ensure sequential write operations. It is impossible, however, to do this all at once. An ideal storage system would have the lowest read cost, lowest write cost, and no overhead.

artificial intelligence, node, sstable, (18 more...)

Communications of the ACM

Country: North America > Canada > Ontario > National Capital Region > Ottawa (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.47)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (0.47)

Add feedback