Guidelines for enhancing data locality in selected machine learning algorithms

Chakroun, Imen, Aa, Tom Vander, Ashby, Thomas J.

arXiv.org Machine Learning 

To deal with the complexity of the new bigger and more complex generation of data, machine learning (ML) techniques are probably the first and foremost used. For ML algorithms to produce results in a reasonable amount of time, they need to be implemented efficiently. In this paper, we analyze one of the means to increase the performances of machine learning algorithms which is exploiting data locality. Data locality and access patterns are often at the heart of performance issues in computing systems due to the use of certain hardware techniques to improve performance. Altering the access patterns to increase locality can dramatically increase performance of a given algorithm. Besides, repeated data access can be seen as redundancy in data movement. Similarly, there can also be redundancy in the repetition of calculations. This work also identifies some of the opportunities for avoiding these redundancies by directly reusing computation results. We start by motivating why and how a more efficient implementation can be achieved by exploiting reuse in the memory hierarchy of modern instruction set processors. Next we document the possibilities of such reuse in some selected machine learning algorithms. Keywords: Increasing data locality, data redundancy and reuse, machine learning, supervised learners... Notice This an extended version of the paper titled "Reviewing Data Access Patterns and Computational Redundancy for Machine Learning Algorithms" that appeared in the proceedings of the IADIS International Conference Big Data Analytics, Data Mining and Computational Intelligence 2019 (part of MCCSIS 2019)" [19] The final publication of this article is available at IOS Press through http://dx.doi.org/10.3233/IDA-184287. Because processor speed is increasing at a much faster rate than memory speed, computer architects have turned increasingly to the use of memory hierarchies with one or more levels of cache memory. This caching technique takes advantage of data locality in programs which is the property that references to the same memory location (temporal locality) or adjacent locations (spatial locality) reused within a short period of time. 1 One of the most popular ways to increase it is to rewrite the data intensive parts of the program, almost always the loops [14]. A simple example of this is to interchange the two loops in Algorithm 1 such that the code looks like Algorithm 2; note that the indices in the loop headers have changed.

Duplicate Docs Excel Report

Title
None found

Similar Docs  Excel Report  more

TitleSimilaritySource
None found