The advent of many-core processors with a greatly reduced amount of per-core memory has shifted the bottleneck in computing from FLOPs to memory. A new, complex memory/storage hierarchy is emerging, with persistent memories offering greatly expanded capacity, and augmented by DRAM/SRAM cache and scratchpads to mitigate latency.
With active memory, computation that is typically handled by a CPU is performed within the memory system. Performance is improved and energy reduced because processing is done in proximity to the data without incurring the overhead of moving the data across chip interconnects from memory to the processor. Emerging 3D memory packaging technology offers new opportunities for computing near memory. Using this technology, a separate logic layer added to the memory stack holds compute elements that operate directly on data in the memory package itself. The benefits of this approach are two-fold. First, the amount of data transferred from the memory chips to the CPU can be reduced because computation can occur in the logic layer. As a second important benefit, computing near memory exploits the orders of magnitude greater bandwidth within the 3D package than is available off-chip.
Many data-centric applications need to perform operations like search, filter, and data reorganization, across a large cross section of data. Using traditional architectures, the data must be moved from storage to memory and then funneled through the CPU. Data-centric operations are ideal for off-load to a memory system with processing capability. These operations can be categorized into three types that can be used independently or in conjunction with the others.
LLNL has an active research program in memory-centric architectures. The LLNL research program focuses on transforming the memory-storage interface with three complementary approaches:
- Active memory and storage in which processing is shared between CPU and in-memory/storage controllers,
- Efficient software cache and scratchpad management, enabling memory-mapped access to large, local persistent stores,
- Algorithms and applications that provide a latency-tolerant, throughput-driven, massively concurrent computation model.
As part of our on-going work, LLNL researchers quantitatively evaluate potential benefits of active memory that may be possible with 3D packaging of memory with logic. From this research program, LLNL has developed a new active memory data reorganization engine.
LLNL has developed a new active memory data reorganization engine. In the simplest case, data can be reorganized within the memory system to present a new view of the data. The new view may be a subset or a rearrangement of the original data. As an example, an array of structures might be more efficiently accessed by a CPU as a structure of arrays. Active memory can assemble an alternative representation within the memory package so that bytes sent to the main CPU are in a cache-friendly layout.
Possible applications include in-memory graph traversal, efficient sparse matrix access for computations on irregular meshes, in-memory, and streaming assembly of multiple resolution image windows of high resolution imagery.