Skip to content

Notes about GPU ProgrammingΒΆ

workflow
  • Always prepare a prototype as a pure CPU algorithm (and ensure correctness)
  • Prepare a correctness test (ideally with a intermediate variables as check points) that can be run very quickly (say within one or a few seconds)
  • Profile the code to find hot spots to be offloaded to GPU
  • Incrementally cover the CPU algorithm with GPU offloading
GPU offloading framework
  • OpenACC: more well supported (reuse examples in NVidia official documentation to avoid hitting compiler bugs)
  • OpenMP
  • standard parallelization (do concurrent)
  • Test correctness, reprofile, offload the next hot spots