The ArrayHandle classes all exclusively work in the control environment.
However, CUDA likes to add __device__ to constructors, destructors, and
assignment operators it automatically adds. This in turn causes warnings
about the __device__ function using host-only classes (like
boost::shared_ptr). Solve this problem by adding explicit methods for
all of these.
Implemented this by wrapping up all these default objects in a macro.
This also solved the problem of other constructors that are necessary
for array handles such as a constructor that takes the base array
handle.
Under CUDA, the default constructors and destructors created are exported
as __host__ and __device__, which causes problems because they used a boost
pointer that only works on the host. The explicit copy constructors and
destructors do the same thing as the default ones except declared to only
work on the host.
Also found a problem with ArrayHandle that manifests itself with derived
types when you first do a PrepareForInput and then a PrepareForInPlace.
The ArrayHandle assumes the data is already moved to the device and
skips the in place call to the array transfer. However, this means the
transfer of the derived array handle does not have a chance to set up
for in place.
I think the appropriate solution may be to move the appropriate logic
from ArrayHandle to ArrayTransfer. I will look into that next.