c1560e2d3f
One of the causes of the large library size and slow compile times has been that vtkm has been creating unnecessary copies when not needed. When the objects being copied use shared_ptr this causes a bloom in library size. I presume this bloom is caused by the atomic increment/decrement that is required by shared_ptr. For testing I used the following example: ``` struct ExampleFieldWorklet : public vtkm::worklet::WorkletMapField { typedef void ControlSignature( FieldIn<>, FieldIn<>, FieldIn<>, FieldOut<>, FieldOut<>, FieldOut<> ); typedef void ExecutionSignature( _1, _2, _3, _4, _5, _6 ); template<typename T, typename U, typename V> VTKM_EXEC_EXPORT void operator()( const vtkm::Vec< T, 3 > & vec, const U & scalar1, const V& scalar2, vtkm::Vec<T, 3>& out_vec, U& out_scalar1, V& out_scalar2 ) const { out_vec = vec * scalar1; out_scalar1 = scalar1 + scalar2; out_scalar2 = scalar2; } template<typename T, typename U, typename V, typename W, typename X, typename Y> VTKM_EXEC_EXPORT void operator()( const T & vec, const U & scalar1, const V& scalar2, W& out_vec, X& out_scalar, Y& ) const { //no-op } }; int main(int argc, char** argv) { std::vector< vtkm::Vec<vtkm::Float32, 3> > inputVec; std::vector< vtkm::Int32 > inputScalar1; std::vector< vtkm::Float64 > inputScalar2; vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleV = vtkm::cont::make_ArrayHandle(inputVec); vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS1 = vtkm::cont::make_ArrayHandle(inputVec); vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleS2 = vtkm::cont::make_ArrayHandle(inputVec); vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOV; vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS1; vtkm::cont::ArrayHandle< vtkm::Vec<vtkm::Float32, 3> > handleOS2; std::cout << "Making 3 output DynamicArrayHandles " << std::endl; vtkm::cont::DynamicArrayHandle out1(handleOV), out2(handleOS1), out3(handleOS2); typedef vtkm::worklet::DispatcherMapField<ExampleFieldWorklet> DispatcherType; std::cout << "Invoking ExampleFieldWorklet" << std::endl; DispatcherType dispatcher; dispatcher.Invoke(handleV, handleS1, handleS2, out1, out2, out3); } ``` Original vtkm would generate a binary of size 4684kb and would perform 91 ArrayHandle copies or assignments. With this branch the binary size is reduced to 2392kb and will perform 36 copies or assignments. |
||
---|---|---|
CMake | ||
docs | ||
examples | ||
vtkm | ||
CMakeLists.txt | ||
CONTRIBUTING.md | ||
CTestConfig.cmake | ||
LICENSE.txt | ||
README.md |
VTK-m
One of the biggest recent changes in high-performance computing is the increasing use of accelerators. Accelerators contain processing cores that independently are inferior to a core in a typical CPU, but these cores are replicated and grouped such that their aggregate execution provides a very high computation rate at a much lower power. Current and future CPU processors also require much more explicit parallelism. Each successive version of the hardware packs more cores into each processor, and technologies like hyperthreading and vector operations require even more parallel processing to leverage each core’s full potential.
VTK-m is a toolkit of scientific visualization algorithms for emerging processor architectures. VTK-m supports the fine-grained concurrency for data analysis and visualization algorithms required to drive extreme scale computing by providing abstract models for data and execution that can be applied to a variety of algorithms across many different processor architectures.
Getting VTK-m
The VTK-m repository is located at https://gitlab.kitware.com/vtk/vtk-m
VTK-m dependencies are:
- CMake 3.0
- Boost 1.52.0 or greater
- Cuda Toolkit 6+ or Thrust 1.7+
git clone https://gitlab.kitware.com/vtk/vtk-m.git vtkm
mkdir vtkm-build
cd vtkm-build
cmake-gui ../vtkm
A detailed walk-through of installing and building VTK-m can be found on our Contributing page