Programming GPUs with SYCL

2020-04-15 GPU SYCL 0 评论

Introduction to GPGPU

Why program GPUs
CPU VS GPU architecture
General GPU programming tips
SYCL for OpenCL

Overview
Features
SYCL example

Vector add

Introduction to GPGPU

Need for parallellism to gain performance

“Free lunch” provided by Moore’s law is Over
adding even more CPU cores is showing diminishing returns
GPUs are extremely efficient for

data parallel tasks
Arithmetic heavy computations

"cpugpu"

CPU:

GPU:

"cpugpu"

"gpugpu"

"commarch"

"lockstep"

"accmem"

ensure the task is suitable
GPUs are most efficient for data parallel tasks
performance gain from prforming computing > cost of moving data
avoid branching
waves of processing elements execute in lock-step
both sides of branches execute with the other masked
avoid non-coalesced memory access
GPUs access memory more efficiently if accessed as contiguous blocks
avoid exponsive data movement
the bottleneck in GPU programming is data movement between CPU and GPU memory
it’s important to have data as clse to the procesing as possible

allows you to write kernels that execute on accelerators
allows you to copy data between the host CPU and accelerators
supports a wide range of devices
comes in two components
Host side C API for en-queueing kernels and copying Data
Device side OpenCL C language for writing kernels

make heterogeneous programming more accessible
provide a foundation for efficient and portable templeate algorithms
create a C++ for OpenCL ecosystem
define an open portable standard
provide the performance and portability of OpenCL
base only on standard C++
provide a high-level shared source model
provide a high-level abstraction over OpenCL boiler plate Code
allow C++ template libraries to target OpenCL
allow type safety across host and device

"sycl"