reuse distance
DMC4ML: Data Movement Complexity for Machine Learning
Ding, Chen, Kanan, Christopher, McKellips, Dylan, Ozawa, Toranosuke, Shahmirza, Arian, Smith, Wesley
The greatest demand for today's computing is machine learning. This paper analyzes three machine learning algorithms: transformers, spatial convolution, and FFT. The analysis is novel in three aspects. First, it measures the cost of memory access on an abstract memory hierarchy, instead of traditional time or space complexity. Second, the analysis is asymptotic and identifies the primary sources of the memory cost. Finally, the result is symbolic, which can be used to select algorithmic parameters such as the group size in grouped query attention for any dimension size and number of heads and the batch size for batched convolution for any image size and kernel size.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Wisconsin > Milwaukee County > Milwaukee (0.04)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
Phoebe: Reuse-Aware Online Caching with Reinforcement Learning for Emerging Storage Models
With data durability, high access speed, low power efficiency and byte addressability, NVMe and SSD, which are acknowledged representatives of emerging storage technologies, have been applied broadly in many areas. However, one key issue with high-performance adoption of these technologies is how to properly define intelligent cache layers such that the performance gap between emerging technologies and main memory can be well bridged. To this end, we propose Phoebe, a reuse-aware reinforcement learning framework for the optimal online caching that is applicable for a wide range of emerging storage models. By continuous interacting with the cache environment and the data stream, Phoebe is capable to extract critical temporal data dependency and relative positional information from a single trace, becoming ever smarter over time. To reduce training overhead during online learning, we utilize periodical training to amortize costs. Phoebe is evaluated on a set of Microsoft cloud storage workloads. Experiment results show that Phoebe is able to close the gap of cache miss rate from LRU and a state-of-the-art online learning based cache policy to the Belady's optimal policy by 70.3% and 52.6%, respectively.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- North America > United States > California > Santa Clara County > Santa Clara (0.04)
- (2 more...)
- Education > Educational Setting > Online (0.55)
- Information Technology (0.46)
DEAP Cache: Deep Eviction Admission and Prefetching for Cache
Mangal, Ayush, Jain, Jitesh, Guliani, Keerat Kaur, Bhalerao, Omkar
Recent approaches for learning policies to improve caching, target just one out of the prefetching, admission and eviction processes. In contrast, we propose an end to end pipeline to learn all three policies using machine learning. We also take inspiration from the success of pretraining on large corpora to learn specialized embeddings for the task. We model prefetching as a sequence prediction task based on past misses. Following previous works suggesting that frequency and recency are the two orthogonal fundamental attributes for caching, we use an online reinforcement learning technique to learn the optimal policy distribution between two orthogonal eviction strategies based on them. While previous approaches used the past as an indicator of the future, we instead explicitly model the future frequency and recency in a multi-task fashion with prefetching, leveraging the abilities of deep networks to capture futuristic trends and use them for learning eviction and admission. We also model the distribution of the data in an online fashion using Kernel Density Estimation in our approach, to deal with the problem of caching non-stationary data. We present our approach as a "proof of concept" of learning all three components of cache strategies using machine learning and leave improving practical deployment for future work.
Learning Forward Reuse Distance
Caching techniques are widely used in the era of cloud computing from applications, such as Web caches to infrastructures, Memcached and memory caches in computer architectures. Prediction of cached data can greatly help improve cache management and performance. The recent advancement of deep learning techniques enables the design of novel intelligent cache replacement policies. In this work, we propose a learning-aided approach to predict future data accesses. We find that a powerful LSTM-based recurrent neural network model can provide high prediction accuracy based on only a cache trace as input. The high accuracy results from a carefully crafted locality-driven feature design. Inspired by the high prediction accuracy, we propose a pseudo OPT policy and evaluate it upon 13 real-world storage workloads from Microsoft Research. Results demonstrate that the new cache policy improves state-of-art practical policies by up to 19.2% and incurs only 2.3% higher miss ratio than OPT on average.
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > New Jersey > Middlesex County > Piscataway (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- (5 more...)
An Imitation Learning Approach for Cache Replacement
Liu, Evan Zheran, Hashemi, Milad, Swersky, Kevin, Ranganathan, Parthasarathy, Ahn, Junwhan
Program execution speed critically depends on increasing cache hits, as cache hits are orders of magnitude faster than misses. To increase cache hits, we focus on the problem of cache replacement: choosing which cache line to evict upon inserting a new line. This is challenging because it requires planning far ahead and currently there is no known practical solution. As a result, current replacement policies typically resort to heuristics designed for specific common access patterns, which fail on more diverse and complex access patterns. In contrast, we propose an imitation learning approach to automatically learn cache access patterns by leveraging Belady's, an oracle policy that computes the optimal eviction decision given the future cache accesses. While directly applying Belady's is infeasible since the future is unknown, we train a policy conditioned only on past accesses that accurately approximates Belady's even on diverse and complex access patterns, and call this approach Parrot. When evaluated on 13 of the most memory-intensive SPEC applications, Parrot increases cache miss rates by 20% over the current state of the art. In addition, on a large-scale web search benchmark, Parrot increases cache hit rates by 61% over a conventional LRU policy. We release a Gym environment to facilitate research in this area, as data is plentiful, and further advancements can have significant real-world impact.
- Europe > Austria > Vienna (0.14)
- North America > United States > California > Santa Clara County > Sunnyvale (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
Guidelines for enhancing data locality in selected machine learning algorithms
Chakroun, Imen, Aa, Tom Vander, Ashby, Thomas J.
To deal with the complexity of the new bigger and more complex generation of data, machine learning (ML) techniques are probably the first and foremost used. For ML algorithms to produce results in a reasonable amount of time, they need to be implemented efficiently. In this paper, we analyze one of the means to increase the performances of machine learning algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We start by motivating why and how a more efficient implementation can be achieved by exploiting reuse in the memory hierarchy of modern instruction set processors. Next we document the possibilities of such reuse in some selected machine learning algorithms. Keywords: Increasing data locality, data redundancy and reuse, machine learning, supervised learners... Notice This an extended version of the paper titled "Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms" that appeared in the proceedings of the IADIS International Conference Big Data Analytics, Data Mining and Computational Intelligence 2019 (part of MCCSIS 2019)" [19] The final publication of this article is available at IOS Press through http://dx.doi.org/10.3233/IDA-184287. Because processor speed is increasing at a much faster rate than memory speed, computer architects have turned increasingly to the use of memory hierarchies with one or more levels of cache memory. This caching technique takes advantage of data locality in programs which is the property that references to the same memory location (temporal locality) or adjacent locations (spatial locality) reused within a short period of time. 1 One of the most popular ways to increase it is to rewrite the data intensive parts of the program, almost always the loops [14]. A simple example of this is to interchange the two loops in Algorithm 1 such that the code looks like Algorithm 2; note that the indices in the loop headers have changed.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)
Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms
Chakroun, Imen, Aa, Tom Vander, Ashby, Tom
Machine learning (ML) is probably the first and foremost used technique to deal with the size and complexity of the new generation of data. In this paper, we analyze one of the means to increase the performances of ML algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We document the possibilities of such reuse in some selected machine learning algorithms and give initial indicative results from our first experiments on data access improvement and algorithm redesign.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- Europe > France > Île-de-France > Paris > Paris (0.04)
- Europe > Belgium > Flanders > Flemish Brabant > Leuven (0.04)