On-chip Accelerators in 4th Gen Intel® Xeon® Scalable Processors: Features, Performance, Use Cases, and Future!

[Placeholder] - Orlando, FL

Around June 20th, 2023


Workloads are evolving — and so is computer architecture. Traditionally, adding more cores to your CPU or choosing a higher-frequency CPU would improve workload performance and efficiency, but these techniques alone can no longer guarantee the same benefits they achieved in the past. Modern workloads place increased demands on compute, network and storage resources. In response, a growing trend exists to deploy power-efficient accelerators to offload specialized functions and reserve compute cores for general-purpose tasks. Offloading specialized tasks to AI, security, HPC, networking, analytics and storage accelerators can result in faster time to results and power savings.

As a result, Intel has integrated the broadest set of built-in accelerators in 4th Gen Intel® Xeon® Scalable processors to boost performance, reduce latency and increase power efficiency. Intel Xeon Scalable processors with Intel® Accelerator Engines can help your business solve today’s most rigorous workload challenges across cloud, networking and enterprise deployments.

This tutorial provides an overview of the latest built-in accelerators -- Data Streaming Accelerator (DSA), In-memory Analytics Accelerator (IAA), QuickAssist Technology (QAT), and Dynamic Load Balancer (DLB) -- and their rich functionalities supported by Intel 4th Gen Xeon Scalable Processors. With several flexible programming models and software libraries, these accelerators have been proven to be beneficial to a wide range of data center infrastructures and applications. In addition, the hands-on labs of this tutorial will take Intel DSA as an example to provide the attendees with the basic knowledge of how to configure, invoke, and make the most out of it with both microbenchmarks and real applications.


Intel DSA supports more operations than previous DMA engines or memory movement accelerators. Below is a table of supported operations in Intel's 4th generation Xeon processors (Sapphire Rapids):

Target Audience

The primary audience for this tutorial are those who work with, or conduct research on, large-scale systems like datacenters or cloud systems. Specifically, those who develop memory-intensive applications intended to be used in aforementioned environments would find the tutorial immensely useful and may find insiration through what Intel DSA offers.


This schedule, for now, is entirely tentative and so far covers only very broad topics we aim to discuss:

  1. Introduction to Intel DSA

    1. Goals of Intel DSA
    2. Hardware Overview
    3. Software Overview
  2. Basic Usage

    1. Setup and Device Discovery
    2. Descriptor Preparation
    3. Operation Differences
  3. Example Use Cases

    1. Problem Background
    2. Rethinking Sollution for Greater Improvements
      1. Convert to Batched Offloads
      2. Asynchronous Programming Model
    3. Demonstration


Yifan Yuan (Intel) - He is a research scientist at Intel Labs. His research interest is computer architecture and systems, with a focus on emerging networking hardware and system software for modern data center. He has published multiple papers in top-tier architecture and systems conferences, as well as four US patents. Yifan Yuan received his PhD in computer engineering from University of Illinois, Urbana-Champaign.

Jiayu Hu (Intel) - She is a Software Network Engineer for Intel with six years of experience working on Data Plane Development Kit and open source communities. Jiayu specializes in optimizing Networked Systems via Intel architecture technologies

Ren Wang (Intel) - She is a staff research scientist at Intel Labs. Her research interests include improving performance and reducing power for processors, platforms and distributed systems, via both software and hardware architecture optimizations. Ren received her Ph.D degree in Computer Science at UCLA. Ren has more than 100 US and international patents, and published over 50 technical papers and book chapters.

Narayan Ranganathan (Intel) - He is a Principal Engineer in the Systems, Software and Architecture Lab in Intel Labs. He is responsible for architecture definition and software prototyping of current and upcoming platform features including accelerator technologies like the Data Streaming Accelerator (DSA). Narayan has been with Intel for 24+ years in a variety of roles including architecture, software and validation. He holds a Masters in Computer Engineering from Clemson University, USA.

Reese Kuper (UIUC) - He is a second year Ph.D. student at University of Illinois, Urbana-Champaign. He has been working at Intel as a part-time employee.

Ipoom Jeong (UIUC) - He is a Postdoctoral Research Associate at the University of Illinois Urbana-Champaign and a member of IEEE. He received his Ph.D. degree in electrical and electronics engineering from Yonsei University, Seoul, South Korea, in 2020. His research experience includes a Hardware Engineer in memory business at Samsung Electronics and a Research Professor at the School of Electrical and Electronic Engineering, Yonsei University. His current research interests include high-performance computer system design, energy-efficient CPU/GPU microarchitectures, and memory/storage system design.

Nam Sung Kim (UIUC) - He is the W.J. ‘Jerry’ Sanders III – Advanced Micro Devices, Inc. Endowed Chair Professor at the University of Illinois, Urbana-Champaign and a fellow of both ACM and IEEE. His interdisciplinary research incorporates device, circuit, architecture, and software for power-efficient computing. He is a recipient of many awards and honors including MICRO Best Paper Award, ACM/IEEE Most Influential ISCA Paper Award, and MICRO Test of Time Award. He is a hall of fame member of all three major computer architecture conferences, HPCA, MICRO, and ISCA.