Collaborating Authors

The Cost of Software-Based Memory Management Without Virtual Memory


Virtual memory has been a standard hardware feature for more than three decades. At the price of increased hardware complexity, it has simplified software and promised strong isolation among colocated processes. In modern computing systems, however, the costs of virtual memory have increased significantly. With large memory workloads, virtualized environments, data center computing, and chips with multiple DMA devices, virtual memory can degrade performance and increase power usage. We therefore explore the implications of building applications and operating systems without relying on hardware support for address translation. Primarily, we investigate the implications of removing the abstraction of large contiguous memory segments. Our experiments show that the overhead to remove this reliance is surprisingly small for real programs. We expect this small overhead to be worth the benefit of reducing the complexity and energy usage of address translation. In fact, in some cases, performance can even improve when address translation is avoided.

Samsung produces eUFS memory for cars


Samsung Electronics will produce eUFS memory for cars, the company has said. The memory was designed for applications for advanced driver assistance systems (ADAS), dashboard systems, and infotainment usage, the firm said. It has a read speed of up to 850 megabytes (MB) per second and a random reading speed of 45,000 input/output operations per second. It meets the JEDEC UFS 2.1 standard, as well as the upcoming JEDEC UFS 3.0 standard, and has the requisite data refresh and temperature notification features. Temperature notification works via a controller from crossing temperature boundaries, which is important for a touch car environment, Samsung said.

Memory Machine


From a technical standpoint, Apple's ability to group like photos together is impressive. Who could have imagined a phone could ever do such a thing as identify all your pets and group them together under the heading "Fluffy friends"? But it's also something your phone doesn't need to revise history to do, and the music and slideshow-panning effects are heavy-handed attempts on Apple's part to repackage your life back to you: See how much better things look with a smartphone in your hand? All the sophisticated machine learning in the world can't minimize the creepiness of big companies like Facebook and Apple trying to horn in on your personal moments. The more these services try to approximate a warm, human touch, the wider the gap between an actual memory and its simulacrum, a capital-M Memory, starts to seem.


Neural Information Processing Systems

The memory capac1ty Is found to be much smal1er than the Kosko upper bound, which Is the lesser of the two dimensions of the BAM. On the average, a 64x64 BAM has about 68 %of the capacity of the corresponding Hopfield memory with the same number of neurons.

Efficient Memory Management for GPU-based Deep Learning Systems Machine Learning

GPU (graphics processing unit) has been used for many data-intensive applications. Among them, deep learning systems are one of the most important consumer systems for GPU nowadays. As deep learning applications impose deeper and larger models in order to achieve higher accuracy, memory management becomes an important research topic for deep learning systems, given that GPU has limited memory size. Many approaches have been proposed towards this issue, e.g., model compression and memory swapping. However, they either degrade the model accuracy or require a lot of manual intervention. In this paper, we propose two orthogonal approaches to reduce the memory cost from the system perspective. Our approaches are transparent to the models, and thus do not affect the model accuracy. They are achieved by exploiting the iterative nature of the training algorithm of deep learning to derive the lifetime and read/write order of all variables. With the lifetime semantics, we are able to implement a memory pool with minimal fragments. However, the optimization problem is NP-complete. We propose a heuristic algorithm that reduces up to 13.3% of memory compared with Nvidia's default memory pool with equal time complexity. With the read/write semantics, the variables that are not in use can be swapped out from GPU to CPU to reduce the memory footprint. We propose multiple swapping strategies to automatically decide which variable to swap and when to swap out (in), which reduces the memory cost by up to 34.2% without communication overhead.