Maple: Efficient Data-Transfer-Aware Mapping Exploration for NDP-enabled Edge LLM Inference

Published in Design Automation Conference (DAC), 2026

Near-DRAM Processing (NDP) has demonstrated great potential for memory-bound operators in edge-side large language model (LLM) inference. Featuring a xPU-NDP heterogeneous system, existing designs concentrate on optimizing the execution flow within processing units, yet ignoring the bottleneck arising from xPU-NDP data transfer. To tackle this challenge, we propose Maple, an efficient mapping exploration framework. Given a specific workload and NDP architecture, Maple adopts an address-mapping-based description method to construct a comprehensive search space that encompasses resource grouping, tensor partitioning and transfer binding, thereby facilitating joint optimization of computation and data transfer. Experiments show that Maple achieves an improvement of up to 3.34$\times$ in performance and 1.49$\times$ in energy compared with existing approaches on mainstream NDP architectures.