HALMD does not start with a huge number of particles
The construction of mdsim::gpu::particle fails if the particle number exceeds 8388480 = 65535 * 128. These compute dimensions are calculated upon member initialisation. While it may be sufficient to limit the number of blocks to 65535, it should be possible for the block size to use the maximum value (1024). On the other hand, a certain number of blocks is required for good device occupancy. (How many? #SMs times a small factor?) So fixing the block size to its maximum would be counterproductive.
1) device::validate() has not detected the mismatch.
2) for given particle number, we may try to start with the maximum block size and lower it until the number of blocks is sufficiently large. The challenge is to determine the number of SMs of a given device.
The deviceQuery from the SDK knows it (8 SMs for GTX960):
Device 0: "GeForce GTX 960" CUDA Driver Version / Runtime Version 8.0 / 7.5 CUDA Capability Major/Minor version number: 5.2 Total amount of global memory: 4038 MBytes (4233691136 bytes) ( 8) Multiprocessors, (128) CUDA Cores/MP: 1024 CUDA Cores
Perhaps a useful link: