Notes about GPU Programming
workflow
- Always prepare a prototype as a pure CPU algorithm (and ensure correctness)
- Prepare a correctness test (ideally with a intermediate variables as check points) that can be run very quickly (say within one or a few seconds)
- Profile the code to find hot spots to be offloaded to GPU
- Incrementally cover the CPU algorithm with GPU offloading
GPU offloading framework
- OpenACC: more well supported (reuse examples in NVidia official documentation to avoid hitting compiler bugs)
- OpenMP
- standard parallelization (
do concurrent
)
- Test correctness, reprofile, offload the next hot spots