Skip to content

Notes about GPU Programming¶

workflow

Always prepare a prototype as a pure CPU algorithm (and ensure correctness)
Prepare a correctness test (ideally with a intermediate variables as check points) that can be run very quickly (say within one or a few seconds)
Profile the code to find hot spots to be offloaded to GPU
Incrementally cover the CPU algorithm with GPU offloading

GPU offloading framework

OpenACC: more well supported (reuse examples in NVidia official documentation to avoid hitting compiler bugs)
OpenMP
standard parallelization (do concurrent)

Test correctness, reprofile, offload the next hot spots