Rather than force all dispatchers to be templated on a device adapter,
instead use a TryExecute internally within the invoke to select a device
adapter.
Because this removes the need to declare a device when invoking a
worklet, this commit also removes the need to declare a device in
several other areas of the code.
Also add a throwFailedRuntimeDeviceTransfer that throws a nicely
detailed message on why a something couldn't be transfered to
the requested device adapter.
-For the BoundingIntervalHierarchy CUDA had failures with using
.cxx file to implement the virtual methods
-Moving the contents to the .hxx file after discussing with Rob
over email
-Need to still work on the .cxx implementation after merge