sycl vector add example

在安装好SYCL和ComputeCpp环境之后,按照codeplay发布的Hello SYCL步骤,想要通过测试最简单的向量加法来测试本地开发环境。结果总是Build完成之后遇到Runtime error, 具体参考Issue #230。不得不说同行们的效率还是挺高,在较短的时间内给与了回复,不仅指出了出错原因,也指明了当前Hello SYCL存在的问题。为community维护的社区程序员点赞!
现将此次问题总结如下:

本地开发环境

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
(base) huang@mlt:~/compute$ /usr/local/computecpp/bin/computecpp_info
********************************************************************************

ComputeCpp Info (CE 1.0.3)

SYCL 1.2.1 revision 3

********************************************************************************

Toolchain information:

GLIBC version: 2.23
GLIBCXX: 20160609
This version of libstdc++ is supported.

********************************************************************************


Device Info:

Discovered 2 devices matching:
platform : <any>
device type : <any>

--------------------------------------------------------------------------------
Device 0:

Device is supported : UNTESTED - Device not tested on this OS
CL_DEVICE_NAME : Intel(R) Gen9 HD Graphics NEO
CL_DEVICE_VENDOR : Intel(R) Corporation
CL_DRIVER_VERSION : 19.20.13008
CL_DEVICE_TYPE : CL_DEVICE_TYPE_GPU
--------------------------------------------------------------------------------
Device 1:

Device is supported : YES - Tested internally by Codeplay Software Ltd.
CL_DEVICE_NAME : Intel(R) Core(TM) i7-8700K CPU @ 3.70GHz
CL_DEVICE_VENDOR : Intel(R) Corporation
CL_DRIVER_VERSION : 1.2.0.10
CL_DEVICE_TYPE : CL_DEVICE_TYPE_CPU

If you encounter problems when using any of these OpenCL devices, please consult
this website for known issues:
https://computecpp.codeplay.com/releases/v1.0.3/platform-support-notes

********************************************************************************

Source Code

https://developer.codeplay.com/products/computecpp/ce/guides/sycl-guide/hello-sycl

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
#include <iostream>
#include <CL/sycl.hpp>

class vector_addition;
namespace sycl = cl::sycl;

int main() {

sycl::float4 a = {1.0, 2.0, 3.0, 4.0};
sycl::float4 b = {4.0, 3.0, 2.0, 3.0};
sycl::float4 c = {.0, 0.0, 0.0, 0.0};

sycl::default_selector device_selector;

sycl::queue queue(device_selector);
std::cout << "Running on "
<< queue.get_device().get_info<sycl::info::device::name>()
<< "\n";
{
sycl::buffer<sycl::float4, 1> a_sycl(&a, sycl::range<1>(1));
sycl::buffer<sycl::float4, 1> b_sycl(&b, sycl::range<1>(1));
sycl::buffer<sycl::float4, 1> c_sycl(&c, sycl::range<1>(1));

queue.submit([&] (sycl::handler& cgh) {
auto a_acc = a_sycl.get_access<sycl::access::mode::read>(cgh);
auto b_acc = b_sycl.get_access<sycl::access::mode::read>(cgh);
auto c_acc = c_sycl.get_access<sycl::access::mode::discard_write>(cgh);

cgh.single_task<class vector_addition>([=] () {
c_acc[0] = a_acc[0] + b_acc[0];
});
});
}
std::cout << " A { " << a.x() << ", " << a.y() << ", " << a.z() << ", " << a.w() << " }\n"
<< "+ B { " << b.x() << ", " << b.y() << ", " << b.z() << ", " << b.w() << " }\n"
<< "------------------\n"
<< "= C { " << c.x() << ", " << c.y() << ", " << c.z() << ", " << c.w() << " }"
<< std::endl;

return 0;
}

How to build

Build.sh

1
2
3
4
5
6
7
8
9
#!/bin/bash

SOUR_FILE=$1
OBJ_FILE=$2

compute++ -g -I/usr/local/computecpp/include ${SOUR_FILE} -sycl-driver -no-serial-memop -L/usr/local/computecpp/lib -lComputeCpp -o ${OBJ_FILE}

echo "Build Done!"
echo $2 "was generated successfully!"

Note:

  • 出错原因就是因为忘记添加-sycl-driver选项,导致kernel在Host端而不是Device端build。
  • sample code存在的一个潜在的问题是,计算完的结果(30行)并不会被写回到Host端,一个简单避免这个问题的方式就是添加一个大的括号将这段buffer写操作圈起来(19行,33行),这样可以在退出时buffer的析构函数自动触发copy back到Host端机制。

Run

1
2
3
4
5
6
7
8
9
(base) huang@mlt:~/compute$ ./build.sh hello_sycl.cpp hello
Build Done!
hello was generated successfully!
(base) huang@mlt:~/compute$ ./hello
Running on Intel(R) Gen9 HD Graphics NEO
A { 1, 2, 3, 4 }
+ B { 4, 3, 2, 3 }
------------------
= C { 5, 5, 5, 7 }