Project

General

Profile

Feature #605674

Optimization: asynchronous kernel launches and copies

Added by Daniel Kirchner 12 months ago.

Status:
New
Priority:
Low
Target version:
-
Start date:
Due date:
% Done:

0%


Description

CUDA kernels and copies between Host and GPU memory could be launched asynchronously, alleviating synchronization delays between GPU and CPU and the overhead of kernel launches.

This would make rigorous fine-grained synchronization necessary. Potentially the caching mechanism could be extended with a "modification_finished" routine that creates a CUDA fence/synchronization object, after which the next call to "read_cache" could wait for the GPU to finish the modifications.

It has to be determined whether such modifications are feasible and worthwhile (How many full synchronizations will still be necessary? How many kernel launches could actually happen asynchronously/simultanously? Etc.).

Also available in: Atom PDF