c08727ebab
Gives few percent of memory improvement for regular feature set kernel and could give significant memory improvement for Experimental kernel. It could also give some degree of performance improvement, but this I didn't really measure reliably yet. Code is ifdef-ed for now, since it's only working on Linux and requires CUDA toolkit to be installed (other platform only use precompiled kernels). This is just an experiment for now and a base for the proper feature support in the future (with runtime compilation using CUDA 7?). |
||
---|---|---|
.. | ||
CMakeLists.txt | ||
device_cpu.cpp | ||
device_cuda.cpp | ||
device_intern.h | ||
device_memory.h | ||
device_multi.cpp | ||
device_network.cpp | ||
device_network.h | ||
device_opencl.cpp | ||
device_task.cpp | ||
device_task.h | ||
device.cpp | ||
device.h |