Optimizing SplashTool_GPU

Model Adjustments

For about a month now, I’ve been working on SplashTool_GPU. Due to the modular setup of the configuration file, the manager, and the model, a large part of the code can remain unchanged; however, numerous changes still had to be made to the model. In particular, masked arrays are not supported by CuPy. The iteration algorithm had to be rewritten so that the software works without using masked arrays while retaining the same functionalities.

CuPy arrays

Another point was the storage of individual data in RAM or GRAM. I decided to implement only the tiles as CuPy arrays, so the untiled data continues to be held in RAM. Reading from GRAM only occurs at an “iteration checkpoint,” which by default is every 1000 iteration steps. This approach results in hardly any speed reduction, but the required GRAM is massively reduced, enabling the calculation of larger areas.

Cuda Kernels

The speed increase is already significant; further optimization is possible through the definition of custom CUDA kernels. By reprogramming part of the iteration, another speed increase by a factor of 2 was achieved.

Comparisons and Speed Tests

SplashTool_GPU was tested for several areas and options, and both the iteration speed and the results were compared with the CPU variant. The iteration speed on the selected system was about 10 times higher, producing identical results. Thus, the iteration core for SplashTool_GPU has now reached a good state.

Scroll to Top