error == cudaSuccess (209 vs. 0) no kernel image is available for execution on the device

2019-10-18 03:03:17 错误异常

Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards

I’ve seen some confusion regarding NVIDIA’s nvcc sm flags and what they’re used for: When compiling with NVCC, the arch flag (‘-arch‘) specifies the name of the NVIDIA GPU architecture that the CUDA files will be compiled for. Gencodes (‘-gencode‘) allows for more PTX generations, and can be repeated many times for different architectures.

When should different ‘gencodes’ or ‘cuda arch’ be used?

When you compile CUDA code, you should always compile only one ‘-arch‘ flag that matches your most used GPU cards. This will enable faster runtime, because code generation will occur during compilation. If you only mention ‘-gencode‘, but omit the ‘-arch‘ flag, the GPU code generation will occur on the JIT compiler by the CUDA driver.

When you want to speed up CUDA compilation, you want to reduce the amount of irrelevant ‘-gencode‘ flags. However, sometimes you may wish to have better CUDA backwards compatibility by adding more comprehensive ‘-gencode‘ flags.

Find out which GPU you have, and which CUDA version you have first. (http://arnon.dk/check-cuda-installed/)

Sample Flags

According to NVIDIA:

The arch= clause of the -gencode= command-line option to nvcc specifies the front-end compilation target and must always be a PTX version. The code= clause specifies the back-end compilation target and can either be cubin or PTX or both. Only the back-end target version(s) specified by the code= clause will be retained in the resulting binary; at least one must be PTX to provide Volta compatibility.

Sample flags for generation on CUDA 7 for maximum compatibility:

-arch=sm_30 \
 -gencode=arch=compute_20,code=sm_20 \
 -gencode=arch=compute_30,code=sm_30 \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_52,code=compute_52

Sample flags for generation on CUDA 8 for maximum compatibility:

-arch=sm_30 \
 -gencode=arch=compute_20,code=sm_20 \
 -gencode=arch=compute_30,code=sm_30 \
 -gencode=arch=compute_50,code=sm_50 \
 -gencode=arch=compute_52,code=sm_52 \
 -gencode=arch=compute_60,code=sm_60 \
 -gencode=arch=compute_61,code=sm_61 \
 -gencode=arch=compute_61,code=compute_61

Sample flags for generation on CUDA 9 for maximum compatibility with Volta cards. Note the removed SM_20:

-arch=sm_50 \
-gencode=arch=compute_50,code=sm_50 \
-gencode=arch=compute_52,code=sm_52 \
-gencode=arch=compute_60,code=sm_60 \
-gencode=arch=compute_61,code=sm_61 \
-gencode=arch=compute_70,code=sm_70 \ 
-gencode=arch=compute_70,code=compute_70

Sample flags for generation on CUDA 10 for maximum compatibility with Turing cards:

-arch=sm_50 \ 
-gencode=arch=compute_50,code=sm_50 \ 
-gencode=arch=compute_52,code=sm_52 \ 
-gencode=arch=compute_60,code=sm_60 \ 
-gencode=arch=compute_61,code=sm_61 \ 
-gencode=arch=compute_70,code=sm_70 \ 
-gencode=arch=compute_75,code=sm_75 \
-gencode=arch=compute_75,code=compute_75

欢迎小伙伴们在下方评论区留言 ~ O(∩_∩)O
文章对我有帮助, 点此请博主吃包辣条 ~ O(∩_∩)O

error == cudaSuccess (209 vs. 0) no kernel image is available for execution on the device

Matching SM architectures (CUDA arch and CUDA gencode) for various NVIDIA cards

When should different ‘gencodes’ or ‘cuda arch’ be used?

Sample Flags

猜你喜欢

评论

未登录

标签