Because of the insanely parallel nature of GPU's, some calculations can be accellerated by executing them as a kernel on the GPU instead of the CPU. Often, these calculations are subroutines to some larger algorithm, containing many more substeps. However, it is not always clear in advance which routines should best be executed on the CPU, and which on the GPU. Switching between devices is nontrivial, because data has to be up to date and ready to be read/written to by the device in question. These difficulties call for a mechanism that allows the programmer to try different combinations (paths) of devices without much effort, in order to choose the best option.
The HyCuda code-generator generates such a framework, based on a specification file that describes some properties of the algorithm. It generates C++11 header- and source-files, only a few of which have to be modified by the programmer in order to implement the algorithm itself. The structure of the specification-file and options to HyCuda, as well as the generated template mechanism, will be described in this document.
Joren Heit 2013-12-17