Sitemap
A list of all the posts and pages found on the site. For you robots out there, there is an XML version available for digesting as well.
Pages
About me
About me
Posts
Future Blog Post
Published:
This post will show up by default. To disable scheduling of future posts, edit config.yml
and set future: false
.
Blog Post number 4
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 3
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 2
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
Blog Post number 1
Published:
This is a sample blog post. Lorem ipsum I can’t remember the rest of lorem ipsum and don’t have an internet connection right now. Testing testing testing this blog post. Blog posts are cool.
portfolio
Portfolio item number 1
Short description of portfolio item number 1
Portfolio item number 2
Short description of portfolio item number 2
publications
GNNear: Accelerating Full-Batch Training of Graph Neural Networks with Near-Memory Processing
Published in International Conference on Parallel Architectures and Compilation Techniques (PACT), 2022
In this paper we propose GNNear, a hybrid accelerator architecture leveraging near-memory processing to accelerate full-batch GNN training on large graphs. GNNear matchs the heterogeneous nature of GNN training by offloading the memory-intensive Reduce operations to in-DIMM Near-Memory-Engines (NMEs) and using a Centralized-Acceleration-Engine (CAE) to process the computation-intensive Update operations. To deal with the irregularity of graphs, we also propose several optimization strategies concerning data reuse, graph partitioning, and dataflow scheduling, etc. Evaluations on 16 tasks demonstrate that GNNear achieves 30.8× / 2.5× geomean speedup and 79.6× / 7.3× (geomean) higher energy efficiency compared to Xeon E5-2698-v4 CPU and V100 GPU.
DIMM-Link: Enabling Efficient Inter-DIMM Communication for Near-Memory Processing
Published in International Symposium on High-Performance Computer Architecture (HPCA), 2023
DIMM-based near-memory processing architectures (DIMM-NMP) have received growing interest from both academia and industry. They have the advantages of large memory capacity, low manufacturing cost, high flexibility, compatible form factor, etc. However, inter-DIMM communication (IDC) has become a critical obstacle for generic DIMM-NMP architectures because it involves costly forwarding transactions through the host CPU. Recent research has demonstrated that, for many applications, the overhead induced by IDC may even offset the performance and energy benefits of near-memory processing.To tackle this problem, we propose DIMM-Link, which enables high-performance IDC in DIMM-NMP architectures and supports seamless integration with existing host memory systems. It adopts bidirectional external data links to connect DIMMs, via which point-to-point communication and inter-DIMM broadcast are efficiently supported in a packet-routing way. We present the full-stack design of DIMM-Link, including the hardware architecture, interconnect protocol, system organization, routing mechanisms, optimization strategies, etc. Comprehensive experiments on typical data-intensive tasks demonstrate that the DIMM-Link-equipped NMP system can achieve a 5.93× average speedup over the 16-core CPU baseline. Compared to other IDC methods, DIMM-Link outperforms MCN, AIM, and ABC-DIMM by 2.42×, 1.87×, and 1.77×, respectively. More importantly, DIMM-Link fully considers the implementation feasibility and system integration constraints, which are critical for designing NMP architectures based on modern DDR4/DDR5 DIMMs.
NMExplorer: An Efficient Exploration Framework for DIMM-based Near-Memory Tensor Reduction
Published in Design Automation Conference (DAC), 2023
Various DIMM-based near-memory processing (DIMM-NMP) architectures have been proposed to accelerate tensor reduction. With careful evaluation, we find that diverse scenarios exhibit distinct performance on DIMM-NMP architectures adopting different design configurations. However, given a tensor reduction scenario, there is a lack of a fast and accurate solution to identify a proper DIMM-NMP architecture. To tackle this problem, we propose an efficient exploration framework called NMExplorer. Given a scenario and hardware parameters, NMExplorer can generate and explore a wide range of potential design configurations. Experiments show that the recommended designs can outperform state-of-the-art DIMM-NMP accelerators by up to 1.95× in performance and 3.69× in energy.
PIM-DL: Expanding the Applicability of Commodity DRAM-PIMs for Deep Learning via Algorithm-System Co-Optimization
Published in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
DRAM-based processing-in-memory (DRAM-PIM) has gained commercial prominence in recent years. However, their integration for deep learning acceleration poses inherent challenges. Existing DRAM-PIMs are limited in computational capabilities, primarily applicable for element-wise and GEMV operators. Unfortunately, these operators contribute only a small portion of the execution time in most DNN workloads. Current systems still necessitate powerful hosts to handle a significant portion of compute-heavy operators.
SpecPIM: Accelerating Speculative Inference on PIM-Enabled System via Architecture-Dataflow Co-Exploration
Published in International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2024
Generative large language models’ (LLMs) inference suffers from inefficiency because of the token dependency brought by autoregressive decoding. Recently, speculative inference has been proposed to alleviate this problem, which introduces small language models to generate draft tokens and adopts the original large language model to conduct verification. Although speculative inference can enhance the efficiency of the decoding procedure, we find that it presents variable resource demands due to the distinct computation patterns of the models used in speculative inference. This variability impedes the full realization of speculative inference’s acceleration potential in current systems.
talks
Talk 1 on Relevant Topic in Your Field
Published:
This is a description of your talk, which is a markdown file that can be all markdown-ified like any other post. Yay markdown!
Conference Proceeding talk 3 on Relevant Topic in Your Field
Published:
This is a description of your conference proceedings talk, note the different field in type. You can put anything in this field.
teaching
Teaching experience 1
Undergraduate course, University 1, Department, 2014
This is a description of a teaching experience. You can use markdown like any other post.
Teaching experience 2
Workshop, University 1, Department, 2015
This is a description of a teaching experience. You can use markdown like any other post.