From fb6235e0e95377deeab138972d5300afa83cb8ee Mon Sep 17 00:00:00 2001
From: Robert Maynard <robert.maynard@kitware.com>
Date: Mon, 24 Jun 2019 12:30:44 -0400
Subject: [PATCH 1/8] VTK-m and DIY now properly export MPI requirements.

Previously an installed version of VTK-m wasn't relocatable as
it had system MPI paths. Additionally the installed vtkm_diy target
would depend on MPI but not `find_package(MPI)`
---
 CMake/FindMPI.cmake                | 1756 ++++++++++++++++++++++++++++
 CMake/VTKmMPI.cmake                |   21 +
 CMake/VTKmWrappers.cmake           |    1 +
 CMakeLists.txt                     |   10 +-
 vtkm/thirdparty/diy/CMakeLists.txt |   18 +-
 5 files changed, 1781 insertions(+), 25 deletions(-)
 create mode 100644 CMake/FindMPI.cmake
 create mode 100644 CMake/VTKmMPI.cmake
diff --git a/CMake/FindMPI.cmake b/CMake/FindMPI.cmake
new file mode 100644
index 000000000..3c7fe377c
--- /dev/null
+++ b/CMake/FindMPI.cmake
@@ -0,0 +1,1756 @@
+# Distributed under the OSI-approved BSD 3-Clause License.  See accompanying
+# file Copyright.txt or https://cmake.org/licensing for details.
+
+#[=======================================================================[.rst:
+FindMPI
+-------
+
+Find a Message Passing Interface (MPI) implementation.
+
+The Message Passing Interface (MPI) is a library used to write
+high-performance distributed-memory parallel applications, and is
+typically deployed on a cluster.  MPI is a standard interface (defined
+by the MPI forum) for which many implementations are available.
+
+Variables for using MPI
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The module exposes the components ``C``, ``CXX``, ``MPICXX`` and ``Fortran``.
+Each of these controls the various MPI languages to search for.
+The difference between ``CXX`` and ``MPICXX`` is that ``CXX`` refers to the
+MPI C API being usable from C++, whereas ``MPICXX`` refers to the MPI-2 C++ API
+that was removed again in MPI-3.
+
+Depending on the enabled components the following variables will be set:
+
+``MPI_FOUND``
+  Variable indicating that MPI settings for all requested languages have been found.
+  If no components are specified, this is true if MPI settings for all enabled languages
+  were detected. Note that the ``MPICXX`` component does not affect this variable.
+``MPI_VERSION``
+  Minimal version of MPI detected among the requested languages, or all enabled languages
+  if no components were specified.
+
+This module will set the following variables per language in your
+project, where ``<lang>`` is one of C, CXX, or Fortran:
+
+``MPI_<lang>_FOUND``
+  Variable indicating the MPI settings for ``<lang>`` were found and that
+  simple MPI test programs compile with the provided settings.
+``MPI_<lang>_COMPILER``
+  MPI compiler for ``<lang>`` if such a program exists.
+``MPI_<lang>_COMPILE_OPTIONS``
+  Compilation options for MPI programs in ``<lang>``, given as a :ref:`;-list <CMake Language Lists>`.
+``MPI_<lang>_COMPILE_DEFINITIONS``
+  Compilation definitions for MPI programs in ``<lang>``, given as a :ref:`;-list <CMake Language Lists>`.
+``MPI_<lang>_INCLUDE_DIRS``
+  Include path(s) for MPI header.
+``MPI_<lang>_LINK_FLAGS``
+  Linker flags for MPI programs.
+``MPI_<lang>_LIBRARIES``
+  All libraries to link MPI programs against.
+
+Additionally, the following :prop_tgt:`IMPORTED` targets are defined:
+
+``MPI::MPI_<lang>``
+  Target for using MPI from ``<lang>``.
+
+The following variables indicating which bindings are present will be defined:
+
+``MPI_MPICXX_FOUND``
+  Variable indicating whether the MPI-2 C++ bindings are present (introduced in MPI-2, removed with MPI-3).
+``MPI_Fortran_HAVE_F77_HEADER``
+  True if the Fortran 77 header ``mpif.h`` is available.
+``MPI_Fortran_HAVE_F90_MODULE``
+  True if the Fortran 90 module ``mpi`` can be used for accessing MPI (MPI-2 and higher only).
+``MPI_Fortran_HAVE_F08_MODULE``
+  True if the Fortran 2008 ``mpi_f08`` is available to MPI programs (MPI-3 and higher only).
+
+If possible, the MPI version will be determined by this module. The facilities to detect the MPI version
+were introduced with MPI-1.2, and therefore cannot be found for older MPI versions.
+
+``MPI_<lang>_VERSION_MAJOR``
+  Major version of MPI implemented for ``<lang>`` by the MPI distribution.
+``MPI_<lang>_VERSION_MINOR``
+  Minor version of MPI implemented for ``<lang>`` by the MPI distribution.
+``MPI_<lang>_VERSION``
+  MPI version implemented for ``<lang>`` by the MPI distribution.
+
+Note that there's no variable for the C bindings being accessible through ``mpi.h``, since the MPI standards
+always have required this binding to work in both C and C++ code.
+
+For running MPI programs, the module sets the following variables
+
+``MPIEXEC_EXECUTABLE``
+  Executable for running MPI programs, if such exists.
+``MPIEXEC_NUMPROC_FLAG``
+  Flag to pass to ``mpiexec`` before giving it the number of processors to run on.
+``MPIEXEC_MAX_NUMPROCS``
+  Number of MPI processors to utilize. Defaults to the number
+  of processors detected on the host system.
+``MPIEXEC_PREFLAGS``
+  Flags to pass to ``mpiexec`` directly before the executable to run.
+``MPIEXEC_POSTFLAGS``
+  Flags to pass to ``mpiexec`` after other flags.
+
+Variables for locating MPI
+^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+This module performs a three step search for an MPI implementation:
+
+1. Check if the compiler has MPI support built-in. This is the case if the user passed a
+   compiler wrapper as ``CMAKE_<LANG>_COMPILER`` or if they're on a Cray system.
+2. Attempt to find an MPI compiler wrapper and determine the compiler information from it.
+3. Try to find an MPI implementation that does not ship such a wrapper by guessing settings.
+   Currently, only Microsoft MPI and MPICH2 on Windows are supported.
+
+For controlling the second step, the following variables may be set:
+
+``MPI_<lang>_COMPILER``
+  Search for the specified compiler wrapper and use it.
+``MPI_<lang>_COMPILER_FLAGS``
+  Flags to pass to the MPI compiler wrapper during interrogation. Some compiler wrappers
+  support linking debug or tracing libraries if a specific flag is passed and this variable
+  may be used to obtain them.
+``MPI_COMPILER_FLAGS``
+  Used to initialize ``MPI_<lang>_COMPILER_FLAGS`` if no language specific flag has been given.
+  Empty by default.
+``MPI_EXECUTABLE_SUFFIX``
+  A suffix which is appended to all names that are being looked for. For instance you may set this
+  to ``.mpich`` or ``.openmpi`` to prefer the one or the other on Debian and its derivatives.
+
+In order to control the guessing step, the following variable may be set:
+
+``MPI_GUESS_LIBRARY_NAME``
+  Valid values are ``MSMPI`` and ``MPICH2``. If set, only the given library will be searched for.
+  By default, ``MSMPI`` will be preferred over ``MPICH2`` if both are available.
+  This also sets ``MPI_SKIP_COMPILER_WRAPPER`` to ``true``, which may be overridden.
+
+Each of the search steps may be skipped with the following control variables:
+
+``MPI_ASSUME_NO_BUILTIN_MPI``
+  If true, the module assumes that the compiler itself does not provide an MPI implementation and
+  skips to step 2.
+``MPI_SKIP_COMPILER_WRAPPER``
+  If true, no compiler wrapper will be searched for.
+``MPI_SKIP_GUESSING``
+  If true, the guessing step will be skipped.
+
+Additionally, the following control variable is available to change search behavior:
+
+``MPI_CXX_SKIP_MPICXX``
+  Add some definitions that will disable the MPI-2 C++ bindings.
+  Currently supported are MPICH, Open MPI, Platform MPI and derivatives thereof,
+  for example MVAPICH or Intel MPI.
+
+If the find procedure fails for a variable ``MPI_<lang>_WORKS``, then the settings detected by or passed to
+the module did not work and even a simple MPI test program failed to compile.
+
+If all of these parameters were not sufficient to find the right MPI implementation, a user may
+disable the entire autodetection process by specifying both a list of libraries in ``MPI_<lang>_LIBRARIES``
+and a list of include directories in ``MPI_<lang>_ADDITIONAL_INCLUDE_DIRS``.
+Any other variable may be set in addition to these two. The module will then validate the MPI settings and store the
+settings in the cache.
+
+Cache variables for MPI
+^^^^^^^^^^^^^^^^^^^^^^^
+
+The variable ``MPI_<lang>_INCLUDE_DIRS`` will be assembled from the following variables.
+For C and CXX:
+
+``MPI_<lang>_HEADER_DIR``
+  Location of the ``mpi.h`` header on disk.
+
+For Fortran:
+
+``MPI_Fortran_F77_HEADER_DIR``
+  Location of the Fortran 77 header ``mpif.h``, if it exists.
+``MPI_Fortran_MODULE_DIR``
+  Location of the ``mpi`` or ``mpi_f08`` modules, if available.
+
+For all languages the following variables are additionally considered:
+
+``MPI_<lang>_ADDITIONAL_INCLUDE_DIRS``
+  A :ref:`;-list <CMake Language Lists>` of paths needed in addition to the normal include directories.
+``MPI_<include_name>_INCLUDE_DIR``
+  Path variables for include folders referred to by ``<include_name>``.
+``MPI_<lang>_ADDITIONAL_INCLUDE_VARS``
+  A :ref:`;-list <CMake Language Lists>` of ``<include_name>`` that will be added to the include locations of ``<lang>``.
+
+The variable ``MPI_<lang>_LIBRARIES`` will be assembled from the following variables:
+
+``MPI_<lib_name>_LIBRARY``
+  The location of a library called ``<lib_name>`` for use with MPI.
+``MPI_<lang>_LIB_NAMES``
+  A :ref:`;-list <CMake Language Lists>` of ``<lib_name>`` that will be added to the include locations of ``<lang>``.
+
+Usage of mpiexec
+^^^^^^^^^^^^^^^^
+
+When using ``MPIEXEC_EXECUTABLE`` to execute MPI applications, you should typically
+use all of the ``MPIEXEC_EXECUTABLE`` flags as follows:
+
+::
+
+   ${MPIEXEC_EXECUTABLE} ${MPIEXEC_NUMPROC_FLAG} ${MPIEXEC_MAX_NUMPROCS}
+     ${MPIEXEC_PREFLAGS} EXECUTABLE ${MPIEXEC_POSTFLAGS} ARGS
+
+where ``EXECUTABLE`` is the MPI program, and ``ARGS`` are the arguments to
+pass to the MPI program.
+
+Advanced variables for using MPI
+^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
+
+The module can perform some advanced feature detections upon explicit request.
+
+**Important notice:** The following checks cannot be performed without *executing* an MPI test program.
+Consider the special considerations for the behavior of :command:`try_run` during cross compilation.
+Moreover, running an MPI program can cause additional issues, like a firewall notification on some systems.
+You should only enable these detections if you absolutely need the information.
+
+If the following variables are set to true, the respective search will be performed:
+
+``MPI_DETERMINE_Fortran_CAPABILITIES``
+  Determine for all available Fortran bindings what the values of ``MPI_SUBARRAYS_SUPPORTED`` and
+  ``MPI_ASYNC_PROTECTS_NONBLOCKING`` are and make their values available as ``MPI_Fortran_<binding>_SUBARRAYS``
+  and ``MPI_Fortran_<binding>_ASYNCPROT``, where ``<binding>`` is one of ``F77_HEADER``, ``F90_MODULE`` and
+  ``F08_MODULE``.
+``MPI_DETERMINE_LIBRARY_VERSION``
+  For each language, find the output of ``MPI_Get_library_version`` and make it available as ``MPI_<lang>_LIBRARY_VERSION_STRING``.
+  This information is usually tied to the runtime component of an MPI implementation and might differ depending on ``<lang>``.
+  Note that the return value is entirely implementation defined. This information might be used to identify
+  the MPI vendor and for example pick the correct one of multiple third party binaries that matches the MPI vendor.
+
+Backward Compatibility
+^^^^^^^^^^^^^^^^^^^^^^
+
+For backward compatibility with older versions of FindMPI, these
+variables are set, but deprecated:
+
+::
+
+   MPI_COMPILER        MPI_LIBRARY        MPI_EXTRA_LIBRARY
+   MPI_COMPILE_FLAGS   MPI_INCLUDE_PATH   MPI_LINK_FLAGS
+   MPI_LIBRARIES
+
+In new projects, please use the ``MPI_<lang>_XXX`` equivalents.
+Additionally, the following variables are deprecated:
+
+``MPI_<lang>_COMPILE_FLAGS``
+  Use ``MPI_<lang>_COMPILE_OPTIONS`` and ``MPI_<lang>_COMPILE_DEFINITIONS`` instead.
+``MPI_<lang>_INCLUDE_PATH``
+  For consumption use ``MPI_<lang>_INCLUDE_DIRS`` and for specifying folders use ``MPI_<lang>_ADDITIONAL_INCLUDE_DIRS`` instead.
+``MPIEXEC``
+  Use ``MPIEXEC_EXECUTABLE`` instead.
+#]=======================================================================]
+
+cmake_policy(PUSH)
+cmake_policy(SET CMP0057 NEW) # if IN_LIST
+
+include(FindPackageHandleStandardArgs)
+
+# Generic compiler names
+set(_MPI_C_GENERIC_COMPILER_NAMES          mpicc    mpcc      mpicc_r mpcc_r)
+set(_MPI_CXX_GENERIC_COMPILER_NAMES        mpicxx   mpiCC     mpcxx   mpCC    mpic++   mpc++
+                                           mpicxx_r mpiCC_r   mpcxx_r mpCC_r  mpic++_r mpc++_r)
+set(_MPI_Fortran_GENERIC_COMPILER_NAMES    mpif95   mpif95_r  mpf95   mpf95_r
+                                           mpif90   mpif90_r  mpf90   mpf90_r
+                                           mpif77   mpif77_r  mpf77   mpf77_r
+                                           mpifc)
+
+# GNU compiler names
+set(_MPI_GNU_C_COMPILER_NAMES              mpigcc mpgcc mpigcc_r mpgcc_r)
+set(_MPI_GNU_CXX_COMPILER_NAMES            mpig++ mpg++ mpig++_r mpg++_r mpigxx)
+set(_MPI_GNU_Fortran_COMPILER_NAMES        mpigfortran mpgfortran mpigfortran_r mpgfortran_r
+                                           mpig77 mpig77_r mpg77 mpg77_r)
+
+# Intel MPI compiler names on Windows
+if(WIN32)
+  list(APPEND _MPI_C_GENERIC_COMPILER_NAMES       mpicc.bat)
+  list(APPEND _MPI_CXX_GENERIC_COMPILER_NAMES     mpicxx.bat)
+  list(APPEND _MPI_Fortran_GENERIC_COMPILER_NAMES mpifc.bat)
+
+  # Intel MPI compiler names
+  set(_MPI_Intel_C_COMPILER_NAMES            mpiicc.bat)
+  set(_MPI_Intel_CXX_COMPILER_NAMES          mpiicpc.bat)
+  set(_MPI_Intel_Fortran_COMPILER_NAMES      mpiifort.bat mpif77.bat mpif90.bat)
+
+  # Intel MPI compiler names for MSMPI
+  set(_MPI_MSVC_C_COMPILER_NAMES             mpicl.bat)
+  set(_MPI_MSVC_CXX_COMPILER_NAMES           mpicl.bat)
+else()
+  # Intel compiler names
+  set(_MPI_Intel_C_COMPILER_NAMES            mpiicc)
+  set(_MPI_Intel_CXX_COMPILER_NAMES          mpiicpc  mpiicxx mpiic++)
+  set(_MPI_Intel_Fortran_COMPILER_NAMES      mpiifort mpiif95 mpiif90 mpiif77)
+endif()
+
+# PGI compiler names
+set(_MPI_PGI_C_COMPILER_NAMES              mpipgcc mppgcc)
+set(_MPI_PGI_CXX_COMPILER_NAMES            mpipgCC mppgCC)
+set(_MPI_PGI_Fortran_COMPILER_NAMES        mpipgf95 mpipgf90 mppgf95 mppgf90 mpipgf77 mppgf77)
+
+# XLC MPI Compiler names
+set(_MPI_XL_C_COMPILER_NAMES               mpxlc      mpxlc_r    mpixlc     mpixlc_r)
+set(_MPI_XL_CXX_COMPILER_NAMES             mpixlcxx   mpixlC     mpixlc++   mpxlcxx   mpxlc++   mpixlc++   mpxlCC
+                                           mpixlcxx_r mpixlC_r   mpixlc++_r mpxlcxx_r mpxlc++_r mpixlc++_r mpxlCC_r)
+set(_MPI_XL_Fortran_COMPILER_NAMES         mpixlf95   mpixlf95_r mpxlf95 mpxlf95_r
+                                           mpixlf90   mpixlf90_r mpxlf90 mpxlf90_r
+                                           mpixlf77   mpixlf77_r mpxlf77 mpxlf77_r
+                                           mpixlf     mpixlf_r   mpxlf   mpxlf_r)
+
+# Prepend vendor-specific compiler wrappers to the list. If we don't know the compiler,
+# attempt all of them.
+# By attempting vendor-specific compiler names first, we should avoid situations where the compiler wrapper
+# stems from a proprietary MPI and won't know which compiler it's being used for. For instance, Intel MPI
+# controls its settings via the I_MPI_CC environment variables if the generic name is being used.
+# If we know which compiler we're working with, we can use the most specialized wrapper there is in order to
+# pick up the right settings for it.
+foreach (LANG IN ITEMS C CXX Fortran)
+  set(_MPI_${LANG}_COMPILER_NAMES "")
+  foreach (id IN ITEMS GNU Intel MSVC PGI XL)
+    if (NOT CMAKE_${LANG}_COMPILER_ID OR CMAKE_${LANG}_COMPILER_ID STREQUAL id)
+      list(APPEND _MPI_${LANG}_COMPILER_NAMES ${_MPI_${id}_${LANG}_COMPILER_NAMES}${MPI_EXECUTABLE_SUFFIX})
+    endif()
+    unset(_MPI_${id}_${LANG}_COMPILER_NAMES)
+  endforeach()
+  list(APPEND _MPI_${LANG}_COMPILER_NAMES ${_MPI_${LANG}_GENERIC_COMPILER_NAMES}${MPI_EXECUTABLE_SUFFIX})
+  unset(_MPI_${LANG}_GENERIC_COMPILER_NAMES)
+endforeach()
+
+# Names to try for mpiexec
+# Only mpiexec commands are guaranteed to behave as described in the standard,
+# mpirun commands are not covered by the standard in any way whatsoever.
+# lamexec is the executable for LAM/MPI, srun is for SLURM or Open MPI with SLURM support.
+# srun -n X <executable> is however a valid command, so it behaves 'like' mpiexec.
+set(_MPIEXEC_NAMES_BASE                   mpiexec mpiexec.hydra mpiexec.mpd mpirun lamexec srun)
+
+unset(_MPIEXEC_NAMES)
+foreach(_MPIEXEC_NAME IN LISTS _MPIEXEC_NAMES_BASE)
+  list(APPEND _MPIEXEC_NAMES "${_MPIEXEC_NAME}${MPI_EXECUTABLE_SUFFIX}")
+endforeach()
+unset(_MPIEXEC_NAMES_BASE)
+
+function (_MPI_check_compiler LANG QUERY_FLAG OUTPUT_VARIABLE RESULT_VARIABLE)
+  if(DEFINED MPI_${LANG}_COMPILER_FLAGS)
+    separate_arguments(_MPI_COMPILER_WRAPPER_OPTIONS NATIVE_COMMAND "${MPI_${LANG}_COMPILER_FLAGS}")
+  else()
+    separate_arguments(_MPI_COMPILER_WRAPPER_OPTIONS NATIVE_COMMAND "${MPI_COMPILER_FLAGS}")
+  endif()
+  execute_process(
+    COMMAND ${MPI_${LANG}_COMPILER} ${_MPI_COMPILER_WRAPPER_OPTIONS} ${QUERY_FLAG}
+    OUTPUT_VARIABLE  WRAPPER_OUTPUT OUTPUT_STRIP_TRAILING_WHITESPACE
+    ERROR_VARIABLE   WRAPPER_OUTPUT ERROR_STRIP_TRAILING_WHITESPACE
+    RESULT_VARIABLE  WRAPPER_RETURN)
+  # Some compiler wrappers will yield spurious zero return values, for example
+  # Intel MPI tolerates unknown arguments and if the MPI wrappers loads a shared
+  # library that has invalid or missing version information there would be warning
+  # messages emitted by ld.so in the compiler output. In either case, we'll treat
+  # the output as invalid.
+  if("${WRAPPER_OUTPUT}" MATCHES "undefined reference|unrecognized|need to set|no version information available|command not found")
+    set(WRAPPER_RETURN 255)
+  endif()
+  # Ensure that no error output might be passed upwards.
+  if(NOT WRAPPER_RETURN EQUAL 0)
+    unset(WRAPPER_OUTPUT)
+  else()
+    # Strip leading whitespace
+    string(REGEX REPLACE "^ +" "" WRAPPER_OUTPUT "${WRAPPER_OUTPUT}")
+  endif()
+  set(${OUTPUT_VARIABLE} "${WRAPPER_OUTPUT}" PARENT_SCOPE)
+  set(${RESULT_VARIABLE} "${WRAPPER_RETURN}" PARENT_SCOPE)
+endfunction()
+
+macro(_MPI_env_set_ifnot VAR VALUE)
+  if(NOT DEFINED ENV{${VAR}})
+    set(_MPI_${VAR}_WAS_SET FALSE)
+    set(ENV{${VAR}} ${${VALUE}})
+  else()
+    set(_MPI_${VAR}_WAS_SET TRUE)
+  endif()
+endmacro()
+
+macro(_MPI_env_unset_ifnot VAR)
+  if(NOT _MPI_${VAR}_WAS_SET)
+    unset(ENV{${VAR}})
+  endif()
+endmacro()
+
+function (_MPI_interrogate_compiler LANG)
+  unset(MPI_COMPILE_CMDLINE)
+  unset(MPI_LINK_CMDLINE)
+
+  unset(MPI_COMPILE_OPTIONS_WORK)
+  unset(MPI_COMPILE_DEFINITIONS_WORK)
+  unset(MPI_INCLUDE_DIRS_WORK)
+  unset(MPI_LINK_FLAGS_WORK)
+  unset(MPI_LIB_NAMES_WORK)
+  unset(MPI_LIB_FULLPATHS_WORK)
+
+  # Define the MPICH and Intel MPI compiler variables to the compilers set in CMake.
+  # It's possible to have a per-compiler configuration in these MPI implementations and
+  # a particular MPICH derivate might check compiler interoperability.
+  # Intel MPI in particular does this with I_MPI_CHECK_COMPILER.
+  file(TO_NATIVE_PATH "${CMAKE_${LANG}_COMPILER}" _MPI_UNDERLAYING_COMPILER)
+  # On Windows, the Intel MPI batch scripts can only work with filnames - Full paths will break them.
+  # Due to the lack of other MPICH-based wrappers for Visual C++, we may treat this as default.
+  if(MSVC)
+    get_filename_component(_MPI_UNDERLAYING_COMPILER "${_MPI_UNDERLAYING_COMPILER}" NAME)
+  endif()
+  if("${LANG}" STREQUAL "C")
+    _MPI_env_set_ifnot(I_MPI_CC _MPI_UNDERLAYING_COMPILER)
+    _MPI_env_set_ifnot(MPICH_CC _MPI_UNDERLAYING_COMPILER)
+  elseif("${LANG}" STREQUAL "CXX")
+    _MPI_env_set_ifnot(I_MPI_CXX _MPI_UNDERLAYING_COMPILER)
+    _MPI_env_set_ifnot(MPICH_CXX _MPI_UNDERLAYING_COMPILER)
+  elseif("${LANG}" STREQUAL "Fortran")
+    _MPI_env_set_ifnot(I_MPI_FC _MPI_UNDERLAYING_COMPILER)
+    _MPI_env_set_ifnot(MPICH_FC _MPI_UNDERLAYING_COMPILER)
+    _MPI_env_set_ifnot(I_MPI_F77 _MPI_UNDERLAYING_COMPILER)
+    _MPI_env_set_ifnot(MPICH_F77 _MPI_UNDERLAYING_COMPILER)
+    _MPI_env_set_ifnot(I_MPI_F90 _MPI_UNDERLAYING_COMPILER)
+    _MPI_env_set_ifnot(MPICH_F90 _MPI_UNDERLAYING_COMPILER)
+  endif()
+
+  # Set these two variables for Intel MPI:
+  #   - I_MPI_DEBUG_INFO_STRIP: It adds 'objcopy' lines to the compiler output. We support stripping them
+  #     (see below), but if we can avoid them in the first place, we should.
+  #   - I_MPI_FORT_BIND: By default Intel MPI makes the C/C++ compiler wrappers link Fortran bindings.
+  #     This is so that mixed-language code doesn't require additional libraries when linking with mpicc.
+  #     For our purposes, this makes little sense, since correct MPI usage from CMake already circumvenes this.
+  set(_MPI_ENV_VALUE "disable")
+  _MPI_env_set_ifnot(I_MPI_DEBUG_INFO_STRIP _MPI_ENV_VALUE)
+  _MPI_env_set_ifnot(I_MPI_FORT_BIND _MPI_ENV_VALUE)
+
+  # Check whether the -showme:compile option works. This indicates that we have either Open MPI
+  # or a newer version of LAM/MPI, and implies that -showme:link will also work.
+  # Open MPI also supports -show, but separates linker and compiler information
+  _MPI_check_compiler(${LANG} "-showme:compile" MPI_COMPILE_CMDLINE MPI_COMPILER_RETURN)
+  if (MPI_COMPILER_RETURN EQUAL 0)
+    _MPI_check_compiler(${LANG} "-showme:link" MPI_LINK_CMDLINE MPI_COMPILER_RETURN)
+
+    if (NOT MPI_COMPILER_RETURN EQUAL 0)
+      unset(MPI_COMPILE_CMDLINE)
+    endif()
+  endif()
+
+  # MPICH and MVAPICH offer -compile-info and -link-info.
+  # For modern versions, both do the same as -show. However, for old versions, they do differ
+  # when called for mpicxx and mpif90 and it's necessary to use them over -show in order to find the
+  # removed MPI C++ bindings.
+  if (NOT MPI_COMPILER_RETURN EQUAL 0)
+    _MPI_check_compiler(${LANG} "-compile-info" MPI_COMPILE_CMDLINE MPI_COMPILER_RETURN)
+
+    if (MPI_COMPILER_RETURN EQUAL 0)
+      _MPI_check_compiler(${LANG} "-link-info" MPI_LINK_CMDLINE MPI_COMPILER_RETURN)
+
+      if (NOT MPI_COMPILER_RETURN EQUAL 0)
+        unset(MPI_COMPILE_CMDLINE)
+      endif()
+    endif()
+  endif()
+
+  # MPICH, MVAPICH2 and Intel MPI just use "-show". Open MPI also offers this, but the
+  # -showme commands are more specialized.
+  if (NOT MPI_COMPILER_RETURN EQUAL 0)
+    _MPI_check_compiler(${LANG} "-show" MPI_COMPILE_CMDLINE MPI_COMPILER_RETURN)
+  endif()
+
+  # Older versions of LAM/MPI have "-showme". Open MPI also supports this.
+  # Unknown to MPICH, MVAPICH and Intel MPI.
+  if (NOT MPI_COMPILER_RETURN EQUAL 0)
+    _MPI_check_compiler(${LANG} "-showme" MPI_COMPILE_CMDLINE MPI_COMPILER_RETURN)
+  endif()
+
+  if (MPI_COMPILER_RETURN EQUAL 0 AND DEFINED MPI_COMPILE_CMDLINE)
+    # Intel MPI can be run with -compchk or I_MPI_CHECK_COMPILER set to 1.
+    # In this case, -show will be prepended with a line to the compiler checker. This is a script that performs
+    # compatibility checks and returns a non-zero exit code together with an error if something fails.
+    # It has to be called as "compchk.sh <arch> <compiler>". Here, <arch> is one out of 32 (i686), 64 (ia64) or 32e (x86_64).
+    # The compiler is identified by filename, and can be either the MPI compiler or the underlying compiler.
+    # NOTE: It is vital to run this script while the environment variables are set up, otherwise it can check the wrong compiler.
+    if("${MPI_COMPILE_CMDLINE}" MATCHES "^([^\" ]+/compchk.sh|\"[^\"]+/compchk.sh\") +([^ ]+)")
+      # Now CMAKE_MATCH_1 contains the path to the compchk.sh file and CMAKE_MATCH_2 the architecture flag.
+      unset(COMPILER_CHECKER_OUTPUT)
+      execute_process(
+      COMMAND ${CMAKE_MATCH_1} ${CMAKE_MATCH_2} ${MPI_${LANG}_COMPILER}
+      OUTPUT_VARIABLE  COMPILER_CHECKER_OUTPUT OUTPUT_STRIP_TRAILING_WHITESPACE
+      ERROR_VARIABLE   COMPILER_CHECKER_OUTPUT ERROR_STRIP_TRAILING_WHITESPACE
+      RESULT_VARIABLE  MPI_COMPILER_RETURN)
+      # If it returned a non-zero value, the check below will fail and cause the interrogation to be aborted.
+      if(NOT MPI_COMPILER_RETURN EQUAL 0)
+        if(NOT MPI_FIND_QUIETLY)
+          message(STATUS "Intel MPI compiler check failed: ${COMPILER_CHECKER_OUTPUT}")
+        endif()
+      else()
+        # Since the check passed, we can remove the compchk.sh script.
+        string(REGEX REPLACE "^([^\" ]+|\"[^\"]+\")/compchk.sh.*\n" "" MPI_COMPILE_CMDLINE "${MPI_COMPILE_CMDLINE}")
+      endif()
+    endif()
+  endif()
+
+  # Revert changes to the environment made previously
+  if("${LANG}" STREQUAL "C")
+    _MPI_env_unset_ifnot(I_MPI_CC)
+    _MPI_env_unset_ifnot(MPICH_CC)
+  elseif("${LANG}" STREQUAL "CXX")
+    _MPI_env_unset_ifnot(I_MPI_CXX)
+    _MPI_env_unset_ifnot(MPICH_CXX)
+  elseif("${LANG}" STREQUAL "Fortran")
+    _MPI_env_unset_ifnot(I_MPI_FC)
+    _MPI_env_unset_ifnot(MPICH_FC)
+    _MPI_env_unset_ifnot(I_MPI_F77)
+    _MPI_env_unset_ifnot(MPICH_F77)
+    _MPI_env_unset_ifnot(I_MPI_F90)
+    _MPI_env_unset_ifnot(MPICH_F90)
+  endif()
+
+  _MPI_env_unset_ifnot(I_MPI_DEBUG_INFO_STRIP)
+  _MPI_env_unset_ifnot(I_MPI_FORT_BIND)
+
+  if (NOT (MPI_COMPILER_RETURN EQUAL 0) OR NOT (DEFINED MPI_COMPILE_CMDLINE))
+    # Cannot interrogate this compiler, so exit.
+    set(MPI_${LANG}_WRAPPER_FOUND FALSE PARENT_SCOPE)
+    return()
+  endif()
+  unset(MPI_COMPILER_RETURN)
+
+  # We have our command lines, but we might need to copy MPI_COMPILE_CMDLINE
+  # into MPI_LINK_CMDLINE, if we didn't find the link line.
+  if (NOT DEFINED MPI_LINK_CMDLINE)
+    set(MPI_LINK_CMDLINE "${MPI_COMPILE_CMDLINE}")
+  endif()
+
+  # Visual Studio parsers permit each flag prefixed by either / or -.
+  # We'll normalize this to the - syntax we use for CMake purposes anyways.
+  if(MSVC)
+    foreach(_MPI_VARIABLE IN ITEMS COMPILE LINK)
+      # The Intel MPI wrappers on Windows prefix their output with some copyright boilerplate.
+      # To prevent possible problems, we discard this text before proceeding with any further matching.
+      string(REGEX REPLACE "^[^ ]+ for the Intel\\(R\\) MPI Library [^\n]+ for Windows\\*\nCopyright\\(C\\) [^\n]+, Intel Corporation\\. All rights reserved\\.\n\n" ""
+        MPI_${_MPI_VARIABLE}_CMDLINE "${MPI_${_MPI_VARIABLE}_CMDLINE}")
+      string(REGEX REPLACE "(^| )/" "\\1-" MPI_${_MPI_VARIABLE}_CMDLINE "${MPI_${_MPI_VARIABLE}_CMDLINE}")
+      string(REPLACE "-libpath:" "-LIBPATH:" MPI_${_MPI_VARIABLE}_CMDLINE "${MPI_${_MPI_VARIABLE}_CMDLINE}")
+    endforeach()
+  endif()
+
+  # For MSVC and cl-compatible compilers, the keyword /link indicates a point after which
+  # everything following is passed to the linker. In this case, we drop all prior information
+  # from the link line and treat any unknown extra flags as linker flags.
+  set(_MPI_FILTERED_LINK_INFORMATION FALSE)
+  if(MSVC)
+    if(MPI_LINK_CMDLINE MATCHES " -(link|LINK) ")
+      string(REGEX REPLACE ".+-(link|LINK) +" "" MPI_LINK_CMDLINE "${MPI_LINK_CMDLINE}")
+      set(_MPI_FILTERED_LINK_INFORMATION TRUE)
+    endif()
+    string(REGEX REPLACE " +-(link|LINK) .+" "" MPI_COMPILE_CMDLINE "${MPI_COMPILE_CMDLINE}")
+  endif()
+
+  if(UNIX)
+    # At this point, we obtained some output from a compiler wrapper that works.
+    # We'll now try to parse it into variables with meaning to us.
+    if("${LANG}" STREQUAL "Fortran")
+      # If MPICH (and derivates) didn't recognize the Fortran compiler include flag during configuration,
+      # they'll return a set of three commands, consisting out of a symlink command for mpif.h,
+      # the actual compiler command and deletion of the created symlink.
+      # Especially with M(VA)PICH-1, this appears to happen erroneously, and therefore we should translate
+      # this output into an additional include directory and then drop it from the output.
+      if("${MPI_COMPILE_CMDLINE}" MATCHES "^ln -s ([^\" ]+|\"[^\"]+\") mpif.h")
+        get_filename_component(MPI_INCLUDE_DIRS_WORK "${CMAKE_MATCH_1}" DIRECTORY)
+        string(REGEX REPLACE "^ln -s ([^\" ]+|\"[^\"]+\") mpif.h\n" "" MPI_COMPILE_CMDLINE "${MPI_COMPILE_CMDLINE}")
+        string(REGEX REPLACE "^ln -s ([^\" ]+|\"[^\"]+\") mpif.h\n" "" MPI_LINK_CMDLINE "${MPI_LINK_CMDLINE}")
+        string(REGEX REPLACE "\nrm -f mpif.h$" "" MPI_COMPILE_CMDLINE "${MPI_COMPILE_CMDLINE}")
+        string(REGEX REPLACE "\nrm -f mpif.h$" "" MPI_LINK_CMDLINE "${MPI_LINK_CMDLINE}")
+      endif()
+    endif()
+
+    # If Intel MPI was configured for static linkage with -static_mpi, the wrapper will by default strip
+    # debug information from resulting binaries (see I_MPI_DEBUG_INFO_STRIP).
+    # Since we cannot process this information into CMake logic, we need to discard the resulting objcopy
+    # commands from the output.
+    string(REGEX REPLACE "(^|\n)objcopy[^\n]+(\n|$)" "" MPI_COMPILE_CMDLINE "${MPI_COMPILE_CMDLINE}")
+    string(REGEX REPLACE "(^|\n)objcopy[^\n]+(\n|$)" "" MPI_LINK_CMDLINE "${MPI_LINK_CMDLINE}")
+  endif()
+
+  # For Visual C++, extracting compiler options in a generic fashion isn't easy. However, no MPI implementation
+  # on Windows seems to require any specific ones, either.
+  if(NOT MSVC)
+    # Extract compile options from the compile command line.
+    string(REGEX MATCHALL "(^| )-f([^\" ]+|\"[^\"]+\")" MPI_ALL_COMPILE_OPTIONS "${MPI_COMPILE_CMDLINE}")
+
+    foreach(_MPI_COMPILE_OPTION IN LISTS MPI_ALL_COMPILE_OPTIONS)
+      string(REGEX REPLACE "^ " "" _MPI_COMPILE_OPTION "${_MPI_COMPILE_OPTION}")
+
+      # Ignore -fstack-protector directives: These occur on MPICH and MVAPICH when the libraries
+      # themselves were built with this flag. However, this flag is unrelated to using MPI, and
+      # we won't match the accompanying --param-ssp-size and -Wp,-D_FORTIFY_SOURCE flags and therefore
+      # produce inconsistent results with the regularly flags.
+      # Similarly, aliasing flags do not belong into our flag array.
+      if(NOT "${_MPI_COMPILE_OPTION}" MATCHES "^-f((no-|)(stack-protector|strict-aliasing)|PI[CE]|pi[ce])")
+        list(APPEND MPI_COMPILE_OPTIONS_WORK "${_MPI_COMPILE_OPTION}")
+      endif()
+    endforeach()
+  endif()
+
+  # For GNU-style compilers, it's possible to prefix includes and definitions with certain flags to pass them
+  # only to the preprocessor. For CMake purposes, we need to treat, but ignore such scopings.
+  # Note that we do not support spaces between the arguments, i.e. -Wp,-I -Wp,/opt/mympi will not be parsed
+  # correctly. This form does not seem to occur in any common MPI implementation, however.
+  if(NOT MSVC)
+    set(_MPI_PREPROCESSOR_FLAG_REGEX "(-Wp,|-Xpreprocessor )?")
+  else()
+    set(_MPI_PREPROCESSOR_FLAG_REGEX "")
+  endif()
+
+  # Same deal as above, for the definitions.
+  string(REGEX MATCHALL "(^| )${_MPI_PREPROCESSOR_FLAG_REGEX}-D *([^\" ]+|\"[^\"]+\")" MPI_ALL_COMPILE_DEFINITIONS "${MPI_COMPILE_CMDLINE}")
+
+  foreach(_MPI_COMPILE_DEFINITION IN LISTS MPI_ALL_COMPILE_DEFINITIONS)
+    string(REGEX REPLACE "^ ?${_MPI_PREPROCESSOR_FLAG_REGEX}-D *" "" _MPI_COMPILE_DEFINITION "${_MPI_COMPILE_DEFINITION}")
+    string(REPLACE "\"" "" _MPI_COMPILE_DEFINITION "${_MPI_COMPILE_DEFINITION}")
+    if(NOT "${_MPI_COMPILE_DEFINITION}" MATCHES "^_FORTIFY_SOURCE.*")
+      list(APPEND MPI_COMPILE_DEFINITIONS_WORK "${_MPI_COMPILE_DEFINITION}")
+    endif()
+  endforeach()
+
+  # Extract include paths from compile command line
+  string(REGEX MATCHALL "(^| )${_MPI_PREPROCESSOR_FLAG_REGEX}${CMAKE_INCLUDE_FLAG_${LANG}} *([^\" ]+|\"[^\"]+\")"
+    MPI_ALL_INCLUDE_PATHS "${MPI_COMPILE_CMDLINE}")
+
+  # If extracting failed to work, we'll try using -showme:incdirs.
+  # Unlike before, we do this without the environment variables set up, but since only MPICH derivates are affected by any of them, and
+  # -showme:... is only supported by Open MPI and LAM/MPI, this isn't a concern.
+  if (NOT MPI_ALL_INCLUDE_PATHS)
+    _MPI_check_compiler(${LANG} "-showme:incdirs" MPI_INCDIRS_CMDLINE MPI_INCDIRS_COMPILER_RETURN)
+    if(MPI_INCDIRS_COMPILER_RETURN)
+      separate_arguments(MPI_ALL_INCLUDE_PATHS NATIVE_COMMAND "${MPI_INCDIRS_CMDLINE}")
+    endif()
+  endif()
+
+  foreach(_MPI_INCLUDE_PATH IN LISTS MPI_ALL_INCLUDE_PATHS)
+    string(REGEX REPLACE "^ ?${_MPI_PREPROCESSOR_FLAG_REGEX}${CMAKE_INCLUDE_FLAG_${LANG}} *" "" _MPI_INCLUDE_PATH "${_MPI_INCLUDE_PATH}")
+    string(REPLACE "\"" "" _MPI_INCLUDE_PATH "${_MPI_INCLUDE_PATH}")
+    get_filename_component(_MPI_INCLUDE_PATH "${_MPI_INCLUDE_PATH}" REALPATH)
+    list(APPEND MPI_INCLUDE_DIRS_WORK "${_MPI_INCLUDE_PATH}")
+  endforeach()
+
+  # The next step are linker flags and library directories. Here, we first take the flags given in raw -L or -LIBPATH: syntax.
+  string(REGEX MATCHALL "(^| )${CMAKE_LIBRARY_PATH_FLAG} *([^\" ]+|\"[^\"]+\")" MPI_DIRECT_LINK_PATHS "${MPI_LINK_CMDLINE}")
+  foreach(_MPI_LPATH IN LISTS MPI_DIRECT_LINK_PATHS)
+    string(REGEX REPLACE "(^| )${CMAKE_LIBRARY_PATH_FLAG} *" "" _MPI_LPATH "${_MPI_LPATH}")
+    list(APPEND MPI_ALL_LINK_PATHS "${_MPI_LPATH}")
+  endforeach()
+
+  # If the link commandline hasn't been filtered (e.g. when using MSVC and /link), we need to extract the relevant parts first.
+  if(NOT _MPI_FILTERED_LINK_INFORMATION)
+    string(REGEX MATCHALL "(^| )(-Wl,|-Xlinker +)([^\" ]+|\"[^\"]+\")" MPI_LINK_FLAGS "${MPI_LINK_CMDLINE}")
+
+    # In this case, we could also find some indirectly given linker paths, e.g. prefixed by -Xlinker or -Wl,
+    # Since syntaxes like -Wl,-L -Wl,/my/path/to/lib are also valid, we parse these paths by first removing -Wl, and -Xlinker
+    # from the list of filtered flags and then parse the remainder of the output.
+    string(REGEX REPLACE "(-Wl,|-Xlinker +)" "" MPI_LINK_FLAGS_RAW "${MPI_LINK_FLAGS}")
+
+    # Now we can parse the leftover output. Note that spaces can now be handled since the above example would reduce to
+    # -L /my/path/to/lib and can be extracted correctly.
+    string(REGEX MATCHALL "^(${CMAKE_LIBRARY_PATH_FLAG},? *|--library-path=)([^\" ]+|\"[^\"]+\")"
+      MPI_INDIRECT_LINK_PATHS "${MPI_LINK_FLAGS_RAW}")
+
+    foreach(_MPI_LPATH IN LISTS MPI_INDIRECT_LINK_PATHS)
+      string(REGEX REPLACE "^(${CMAKE_LIBRARY_PATH_FLAG},? *|--library-path=)" "" _MPI_LPATH "${_MPI_LPATH}")
+      list(APPEND MPI_ALL_LINK_PATHS "${_MPI_LPATH}")
+    endforeach()
+
+    # We need to remove the flags we extracted from the linker flag list now.
+    string(REGEX REPLACE "(^| )(-Wl,|-Xlinker +)(${CMAKE_LIBRARY_PATH_FLAG},? *(-Wl,|-Xlinker +)?|--library-path=)([^\" ]+|\"[^\"]+\")" ""
+      MPI_LINK_CMDLINE_FILTERED "${MPI_LINK_CMDLINE}")
+
+    # Some MPI implementations pass on options they themselves were built with. Since -z,noexecstack is a common
+    # hardening, we should strip it. In general, the -z options should be undesirable.
+    string(REGEX REPLACE "(^| )-Wl,-z(,[^ ]+| +-Wl,[^ ]+)" "" MPI_LINK_CMDLINE_FILTERED "${MPI_LINK_CMDLINE_FILTERED}")
+    string(REGEX REPLACE "(^| )-Xlinker +-z +-Xlinker +[^ ]+" "" MPI_LINK_CMDLINE_FILTERED "${MPI_LINK_CMDLINE_FILTERED}")
+
+    # We only consider options of the form -Wl or -Xlinker:
+    string(REGEX MATCHALL "(^| )(-Wl,|-Xlinker +)([^\" ]+|\"[^\"]+\")" MPI_ALL_LINK_FLAGS "${MPI_LINK_CMDLINE_FILTERED}")
+
+    # As a next step, we assemble the linker flags extracted in a preliminary flags string
+    foreach(_MPI_LINK_FLAG IN LISTS MPI_ALL_LINK_FLAGS)
+      string(STRIP "${_MPI_LINK_FLAG}" _MPI_LINK_FLAG)
+      if (MPI_LINK_FLAGS_WORK)
+        string(APPEND MPI_LINK_FLAGS_WORK " ${_MPI_LINK_FLAG}")
+      else()
+        set(MPI_LINK_FLAGS_WORK "${_MPI_LINK_FLAG}")
+      endif()
+    endforeach()
+  else()
+    # In the filtered case, we obtain the link time flags by just stripping the library paths.
+    string(REGEX REPLACE "(^| )${CMAKE_LIBRARY_PATH_FLAG} *([^\" ]+|\"[^\"]+\")" "" MPI_LINK_CMDLINE_FILTERED "${MPI_LINK_CMDLINE}")
+  endif()
+
+  # If we failed to extract any linker paths, we'll try using the -showme:libdirs option with the MPI compiler.
+  # This will return a list of folders, not a set of flags!
+  if (NOT MPI_ALL_LINK_PATHS)
+    _MPI_check_compiler(${LANG} "-showme:libdirs" MPI_LIBDIRS_CMDLINE MPI_LIBDIRS_COMPILER_RETURN)
+    if(MPI_LIBDIRS_COMPILER_RETURN)
+      separate_arguments(MPI_ALL_LINK_PATHS NATIVE_COMMAND "${MPI_LIBDIRS_CMDLINE}")
+    endif()
+  endif()
+
+  # We need to remove potential quotes and convert the paths to CMake syntax while resolving them, too.
+  foreach(_MPI_LPATH IN LISTS MPI_ALL_LINK_PATHS)
+    string(REPLACE "\"" "" _MPI_LPATH "${_MPI_LPATH}")
+    get_filename_component(_MPI_LPATH "${_MPI_LPATH}" REALPATH)
+    list(APPEND MPI_LINK_DIRECTORIES_WORK "${_MPI_LPATH}")
+  endforeach()
+
+  # Extract the set of libraries to link against from the link command line
+  # This only makes sense if CMAKE_LINK_LIBRARY_FLAG is defined, i.e. a -lxxxx syntax is supported by the compiler.
+  if(CMAKE_LINK_LIBRARY_FLAG)
+    string(REGEX MATCHALL "(^| )${CMAKE_LINK_LIBRARY_FLAG}([^\" ]+|\"[^\"]+\")"
+      MPI_LIBNAMES "${MPI_LINK_CMDLINE}")
+
+    foreach(_MPI_LIB_NAME IN LISTS MPI_LIBNAMES)
+      string(REGEX REPLACE "^ ?${CMAKE_LINK_LIBRARY_FLAG}" "" _MPI_LIB_NAME "${_MPI_LIB_NAME}")
+      string(REPLACE "\"" "" _MPI_LIB_NAME "${_MPI_LIB_NAME}")
+      list(APPEND MPI_LIB_NAMES_WORK "${_MPI_LIB_NAME}")
+    endforeach()
+  endif()
+
+  # Treat linker objects given by full path, for example static libraries, import libraries
+  # or shared libraries if there aren't any import libraries in use on the system.
+  # Note that we do not consider CMAKE_<TYPE>_LIBRARY_PREFIX intentionally here: The linker will for a given file
+  # decide how to link it based on file type, not based on a prefix like 'lib'.
+  set(_MPI_LIB_SUFFIX_REGEX "${CMAKE_STATIC_LIBRARY_SUFFIX}")
+  if(DEFINED CMAKE_IMPORT_LIBRARY_SUFFIX)
+    if(NOT ("${CMAKE_IMPORT_LIBRARY_SUFFIX}" STREQUAL "${CMAKE_STATIC_LIBRARY_SUFFIX}"))
+      string(APPEND _MPI_SUFFIX_REGEX "|${CMAKE_IMPORT_LIBRARY_SUFFIX}")
+    endif()
+  else()
+    string(APPEND _MPI_LIB_SUFFIX_REGEX "|${CMAKE_SHARED_LIBRARY_SUFFIX}")
+  endif()
+  set(_MPI_LIB_NAME_REGEX "(([^\" ]+(${_MPI_LIB_SUFFIX_REGEX}))|(\"[^\"]+(${_MPI_LIB_SUFFIX_REGEX})\"))( +|$)")
+  string(REPLACE "." "\\." _MPI_LIB_NAME_REGEX "${_MPI_LIB_NAME_REGEX}")
+
+  string(REGEX MATCHALL "${_MPI_LIB_NAME_REGEX}" MPI_LIBNAMES "${MPI_LINK_CMDLINE}")
+  foreach(_MPI_LIB_NAME IN LISTS MPI_LIBNAMES)
+    string(REGEX REPLACE "^ +\"?|\"? +$" "" _MPI_LIB_NAME "${_MPI_LIB_NAME}")
+    get_filename_component(_MPI_LIB_PATH "${_MPI_LIB_NAME}" DIRECTORY)
+    if(NOT "${_MPI_LIB_PATH}" STREQUAL "")
+      list(APPEND MPI_LIB_FULLPATHS_WORK "${_MPI_LIB_NAME}")
+    else()
+      list(APPEND MPI_LIB_NAMES_WORK "${_MPI_LIB_NAME}")
+    endif()
+  endforeach()
+
+  # Save the explicitly given link directories
+  set(MPI_LINK_DIRECTORIES_LEFTOVER "${MPI_LINK_DIRECTORIES_WORK}")
+
+  # An MPI compiler wrapper could have its MPI libraries in the implictly
+  # linked directories of the compiler itself.
+  if(DEFINED CMAKE_${LANG}_IMPLICIT_LINK_DIRECTORIES)
+    list(APPEND MPI_LINK_DIRECTORIES_WORK "${CMAKE_${LANG}_IMPLICIT_LINK_DIRECTORIES}")
+  endif()
+
+  # Determine full path names for all of the libraries that one needs
+  # to link against in an MPI program
+  unset(MPI_PLAIN_LIB_NAMES_WORK)
+  foreach(_MPI_LIB_NAME IN LISTS MPI_LIB_NAMES_WORK)
+    get_filename_component(_MPI_PLAIN_LIB_NAME "${_MPI_LIB_NAME}" NAME_WE)
+    list(APPEND MPI_PLAIN_LIB_NAMES_WORK "${_MPI_PLAIN_LIB_NAME}")
+    find_library(MPI_${_MPI_PLAIN_LIB_NAME}_LIBRARY
+      NAMES "${_MPI_LIB_NAME}" "lib${_MPI_LIB_NAME}"
+      HINTS ${MPI_LINK_DIRECTORIES_WORK}
+      DOC "Location of the ${_MPI_PLAIN_LIB_NAME} library for MPI"
+    )
+    mark_as_advanced(MPI_${_MPI_PLAIN_LIB_NAME}_LIBRARY)
+    # Remove the directory from the remainder list.
+    if(MPI_${_MPI_PLAIN_LIB_NAME}_LIBRARY)
+      get_filename_component(_MPI_TAKEN_DIRECTORY "${MPI_${_MPI_PLAIN_LIB_NAME}_LIBRARY}" DIRECTORY)
+      list(REMOVE_ITEM MPI_LINK_DIRECTORIES_LEFTOVER "${_MPI_TAKEN_DIRECTORY}")
+    endif()
+  endforeach()
+
+  # Add the link directories given explicitly that we haven't used back as linker directories.
+  if(NOT WIN32)
+    foreach(_MPI_LINK_DIRECTORY IN LISTS MPI_LINK_DIRECTORIES_LEFTOVER)
+      file(TO_NATIVE_PATH "${_MPI_LINK_DIRECTORY}" _MPI_LINK_DIRECTORY_ACTUAL)
+      string(FIND "${_MPI_LINK_DIRECTORY_ACTUAL}" " " _MPI_LINK_DIRECTORY_CONTAINS_SPACE)
+      if(NOT _MPI_LINK_DIRECTORY_CONTAINS_SPACE EQUAL -1)
+        set(_MPI_LINK_DIRECTORY_ACTUAL "\"${_MPI_LINK_DIRECTORY_ACTUAL}\"")
+      endif()
+      if(MPI_LINK_FLAGS_WORK)
+        string(APPEND MPI_LINK_FLAGS_WORK " ${CMAKE_LIBRARY_PATH_FLAG}${_MPI_LINK_DIRECTORY_ACTUAL}")
+      else()
+        set(MPI_LINK_FLAGS_WORK "${CMAKE_LIBRARY_PATH_FLAG}${_MPI_LINK_DIRECTORY_ACTUAL}")
+      endif()
+    endforeach()
+  endif()
+
+  # Deal with the libraries given with full path next
+  unset(MPI_DIRECT_LIB_NAMES_WORK)
+  foreach(_MPI_LIB_FULLPATH IN LISTS MPI_LIB_FULLPATHS_WORK)
+    get_filename_component(_MPI_PLAIN_LIB_NAME "${_MPI_LIB_FULLPATH}" NAME_WE)
+    list(APPEND MPI_DIRECT_LIB_NAMES_WORK "${_MPI_PLAIN_LIB_NAME}")
+    set(MPI_${_MPI_PLAIN_LIB_NAME}_LIBRARY "${_MPI_LIB_FULLPATH}" CACHE FILEPATH "Location of the ${_MPI_PLAIN_LIB_NAME} library for MPI")
+    mark_as_advanced(MPI_${_MPI_PLAIN_LIB_NAME}_LIBRARY)
+  endforeach()
+  # Directly linked objects should be linked first in case some generic linker flags are needed for them.
+  if(MPI_DIRECT_LIB_NAMES_WORK)
+    set(MPI_PLAIN_LIB_NAMES_WORK "${MPI_DIRECT_LIB_NAMES_WORK};${MPI_PLAIN_LIB_NAMES_WORK}")
+  endif()
+
+  # MPI might require pthread to work. The above mechanism wouldn't detect it, but we need to
+  # link it in that case. -lpthread is covered by the normal library treatment on the other hand.
+  if("${MPI_COMPILE_CMDLINE}" MATCHES "-pthread")
+    list(APPEND MPI_COMPILE_OPTIONS_WORK "-pthread")
+    if(MPI_LINK_FLAGS_WORK)
+      string(APPEND MPI_LINK_FLAGS_WORK " -pthread")
+    else()
+      set(MPI_LINK_FLAGS_WORK "-pthread")
+    endif()
+  endif()
+
+  if(MPI_${LANG}_EXTRA_COMPILE_DEFINITIONS)
+    list(APPEND MPI_COMPILE_DEFINITIONS_WORK "${MPI_${LANG}_EXTRA_COMPILE_DEFINITIONS}")
+  endif()
+  if(MPI_${LANG}_EXTRA_COMPILE_OPTIONS)
+    list(APPEND MPI_COMPILE_OPTIONS_WORK "${MPI_${LANG}_EXTRA_COMPILE_OPTIONS}")
+  endif()
+  if(MPI_${LANG}_EXTRA_LIB_NAMES)
+    list(APPEND MPI_PLAIN_LIB_NAMES_WORK "${MPI_${LANG}_EXTRA_LIB_NAMES}")
+  endif()
+
+  # If we found MPI, set up all of the appropriate cache entries
+  if(NOT MPI_${LANG}_COMPILE_OPTIONS)
+    set(MPI_${LANG}_COMPILE_OPTIONS          ${MPI_COMPILE_OPTIONS_WORK}     CACHE STRING "MPI ${LANG} compilation options"            FORCE)
+  endif()
+  if(NOT MPI_${LANG}_COMPILE_DEFINITIONS)
+    set(MPI_${LANG}_COMPILE_DEFINITIONS      ${MPI_COMPILE_DEFINITIONS_WORK} CACHE STRING "MPI ${LANG} compilation definitions"        FORCE)
+  endif()
+  if(NOT MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS)
+    set(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS  ${MPI_INCLUDE_DIRS_WORK}        CACHE STRING "MPI ${LANG} additional include directories" FORCE)
+  endif()
+  if(NOT MPI_${LANG}_LINK_FLAGS)
+    set(MPI_${LANG}_LINK_FLAGS               ${MPI_LINK_FLAGS_WORK}          CACHE STRING "MPI ${LANG} linker flags"                   FORCE)
+  endif()
+  if(NOT MPI_${LANG}_LIB_NAMES)
+    set(MPI_${LANG}_LIB_NAMES                ${MPI_PLAIN_LIB_NAMES_WORK}     CACHE STRING "MPI ${LANG} libraries to link against"      FORCE)
+  endif()
+  set(MPI_${LANG}_WRAPPER_FOUND TRUE PARENT_SCOPE)
+endfunction()
+
+function(_MPI_guess_settings LANG)
+  set(MPI_GUESS_FOUND FALSE)
+  # Currently only MSMPI and MPICH2 on Windows are supported, so we can skip this search if we're not targeting that.
+  if(WIN32)
+    # MSMPI
+
+    # The environment variables MSMPI_INC and MSMPILIB32/64 are the only ways of locating the MSMPI_SDK,
+    # which is installed separately from the runtime. Thus it's possible to have mpiexec but not MPI headers
+    # or import libraries and vice versa.
+    if(NOT MPI_GUESS_LIBRARY_NAME OR "${MPI_GUESS_LIBRARY_NAME}" STREQUAL "MSMPI")
+      # We first attempt to locate the msmpi.lib. Should be find it, we'll assume that the MPI present is indeed
+      # Microsoft MPI.
+      if("${CMAKE_SIZEOF_VOID_P}" EQUAL 8)
+        set(MPI_MSMPI_LIB_PATH "$ENV{MSMPI_LIB64}")
+        set(MPI_MSMPI_INC_PATH_EXTRA "$ENV{MSMPI_INC}/x64")
+      else()
+        set(MPI_MSMPI_LIB_PATH "$ENV{MSMPI_LIB32}")
+        set(MPI_MSMPI_INC_PATH_EXTRA "$ENV{MSMPI_INC}/x86")
+      endif()
+
+      find_library(MPI_msmpi_LIBRARY
+        NAMES msmpi
+        HINTS ${MPI_MSMPI_LIB_PATH}
+        DOC "Location of the msmpi library for Microsoft MPI")
+      mark_as_advanced(MPI_msmpi_LIBRARY)
+
+      if(MPI_msmpi_LIBRARY)
+        # Next, we attempt to locate the MPI header. Note that for Fortran we know that mpif.h is a way
+        # MSMPI can be used and therefore that header has to be present.
+        if(NOT MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS)
+          get_filename_component(MPI_MSMPI_INC_DIR "$ENV{MSMPI_INC}" REALPATH)
+          set(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS "${MPI_MSMPI_INC_DIR}" CACHE STRING "MPI ${LANG} additional include directories" FORCE)
+          unset(MPI_MSMPI_INC_DIR)
+        endif()
+
+        # For MSMPI, one can compile the MPI module by building the mpi.f90 shipped with the MSMPI SDK,
+        # thus it might be present or provided by the user. Figuring out which is supported is done later on.
+        # The PGI Fortran compiler for instance ships a prebuilt set of modules in its own include folder.
+        # Should a user be employing PGI or have built its own set and provided it via cache variables, the
+        # splitting routine would have located the module files.
+
+        # For C and C++, we're done here (MSMPI does not ship the MPI-2 C++ bindings) - however, for Fortran
+        # we need some extra library to glue Fortran support together:
+        # MSMPI ships 2-4 Fortran libraries, each for different Fortran compiler behaviors. The library names
+        # ending with a c are using the cdecl calling convention, whereas those ending with an s are for Fortran
+        # implementations using stdcall. Therefore, the 64-bit MSMPI only ships those ending in 'c', whereas the 32-bit
+        # has both variants available.
+        # The second difference is the last but one letter, if it's an e(nd), the length of a string argument is
+        # passed by the Fortran compiler after all other arguments on the parameter list, if it's an m(ixed),
+        # it's passed immediately after the string address.
+
+        # To summarize:
+        #   - msmpifec: CHARACTER length passed after the parameter list and using cdecl calling convention
+        #   - msmpifmc: CHARACTER length passed directly after string address and using cdecl calling convention
+        #   - msmpifes: CHARACTER length passed after the parameter list and using stdcall calling convention
+        #   - msmpifms: CHARACTER length passed directly after string address and using stdcall calling convention
+        # 32-bit MSMPI ships all four libraries, 64-bit MSMPI ships only the first two.
+
+        # As is, Intel Fortran and PGI Fortran both use the 'ec' variant of the calling convention, whereas
+        # the old Compaq Visual Fortran compiler defaulted to the 'ms' version. It's possible to make Intel Fortran
+        # use the CVF calling convention using /iface:cvf, but we assume - and this is also assumed in FortranCInterface -
+        # this isn't the case. It's also possible to make CVF use the 'ec' variant, using /iface=(cref,nomixed_str_len_arg).
+
+        # Our strategy is now to locate all libraries, but enter msmpifec into the LIB_NAMES array.
+        # Should this not be adequate it's a straightforward way for a user to change the LIB_NAMES array and
+        # have his library found. Still, this should not be necessary outside of exceptional cases, as reasoned.
+        if ("${LANG}" STREQUAL "Fortran")
+          set(MPI_MSMPI_CALLINGCONVS c)
+          if("${CMAKE_SIZEOF_VOID_P}" EQUAL 4)
+            list(APPEND MPI_MSMPI_CALLINGCONVS s)
+          endif()
+          foreach(mpistrlenpos IN ITEMS e m)
+            foreach(mpicallingconv IN LISTS MPI_MSMPI_CALLINGCONVS)
+              find_library(MPI_msmpif${mpistrlenpos}${mpicallingconv}_LIBRARY
+                NAMES msmpif${mpistrlenpos}${mpicallingconv}
+                HINTS "${MPI_MSMPI_LIB_PATH}"
+                DOC "Location of the msmpi${mpistrlenpos}${mpicallingconv} library for Microsoft MPI")
+              mark_as_advanced(MPI_msmpif${mpistrlenpos}${mpicallingconv}_LIBRARY)
+            endforeach()
+          endforeach()
+          if(NOT MPI_${LANG}_LIB_NAMES)
+            set(MPI_${LANG}_LIB_NAMES "msmpi;msmpifec" CACHE STRING "MPI ${LANG} libraries to link against" FORCE)
+          endif()
+
+          # At this point we're *not* done. MSMPI requires an additional include file for Fortran giving the value
+          # of MPI_AINT. This file is called mpifptr.h located in the x64 and x86 subfolders, respectively.
+          find_path(MPI_mpifptr_INCLUDE_DIR
+            NAMES "mpifptr.h"
+            HINTS "${MPI_MSMPI_INC_PATH_EXTRA}"
+            DOC "Location of the mpifptr.h extra header for Microsoft MPI")
+          if(NOT MPI_${LANG}_ADDITIONAL_INCLUDE_VARS)
+            set(MPI_${LANG}_ADDITIONAL_INCLUDE_VARS "mpifptr" CACHE STRING "MPI ${LANG} additional include directory variables, given in the form MPI_<name>_INCLUDE_DIR." FORCE)
+          endif()
+          mark_as_advanced(MPI_${LANG}_ADDITIONAL_INCLUDE_VARS MPI_mpifptr_INCLUDE_DIR)
+        else()
+          if(NOT MPI_${LANG}_LIB_NAMES)
+            set(MPI_${LANG}_LIB_NAMES "msmpi" CACHE STRING "MPI ${LANG} libraries to link against" FORCE)
+          endif()
+        endif()
+        mark_as_advanced(MPI_${LANG}_LIB_NAMES)
+        set(MPI_GUESS_FOUND TRUE)
+
+        if(_MPIEXEC_NOT_GIVEN)
+          unset(MPIEXEC_EXECUTABLE CACHE)
+        endif()
+
+        find_program(MPIEXEC_EXECUTABLE
+          NAMES mpiexec
+          HINTS $ENV{MSMPI_BIN} "[HKEY_LOCAL_MACHINE\\SOFTWARE\\Microsoft\\MPI;InstallRoot]/Bin"
+          DOC "Executable for running MPI programs.")
+      endif()
+    endif()
+
+    # At this point there's not many MPIs that we could still consider.
+    # OpenMPI 1.6.x and below supported Windows, but these ship compiler wrappers that still work.
+    # The only other relevant MPI implementation without a wrapper is MPICH2, which had Windows support in 1.4.1p1 and older.
+    if(NOT MPI_GUESS_FOUND AND (NOT MPI_GUESS_LIBRARY_NAME OR "${MPI_GUESS_LIBRARY_NAME}" STREQUAL "MPICH2"))
+      set(MPI_MPICH_PREFIX_PATHS
+        "$ENV{ProgramW6432}/MPICH2/lib"
+        "[HKEY_LOCAL_MACHINE\\SOFTWARE\\MPICH\\SMPD;binary]/../lib"
+        "[HKEY_LOCAL_MACHINE\\SOFTWARE\\MPICH2;Path]/lib"
+      )
+
+      # All of C, C++ and Fortran will need mpi.lib, so we'll look for this first
+      find_library(MPI_mpi_LIBRARY
+        NAMES mpi
+        HINTS ${MPI_MPICH_PREFIX_PATHS})
+      mark_as_advanced(MPI_mpi_LIBRARY)
+      # If we found mpi.lib, we detect the rest of MPICH2
+      if(MPI_mpi_LIBRARY)
+        set(MPI_MPICH_LIB_NAMES "mpi")
+        # If MPI-2 C++ bindings are requested, we need to locate cxx.lib as well.
+        # Otherwise, MPICH_SKIP_MPICXX will be defined and these bindings aren't needed.
+        if("${LANG}" STREQUAL "CXX" AND NOT MPI_CXX_SKIP_MPICXX)
+          find_library(MPI_cxx_LIBRARY
+            NAMES cxx
+            HINTS ${MPI_MPICH_PREFIX_PATHS})
+          mark_as_advanced(MPI_cxx_LIBRARY)
+          list(APPEND MPI_MPICH_LIB_NAMES "cxx")
+        # For Fortran, MPICH2 provides three different libraries:
+        #   fmpich2.lib which uses uppercase symbols and cdecl,
+        #   fmpich2s.lib which uses uppercase symbols and stdcall (32-bit only),
+        #   fmpich2g.lib which uses lowercase symbols with double underscores and cdecl.
+        # fmpich2s.lib would be useful for Compaq Visual Fortran, fmpich2g.lib has to be used with GNU g77 and is also
+        # provided in the form of an .a archive for MinGW and Cygwin. From our perspective, fmpich2.lib is the only one
+        # we need to try, and if it doesn't work with the given Fortran compiler we'd find out later on during validation
+        elseif("${LANG}" STREQUAL "Fortran")
+          find_library(MPI_fmpich2_LIBRARY
+            NAMES fmpich2
+            HINTS ${MPI_MPICH_PREFIX_PATHS})
+          find_library(MPI_fmpich2s_LIBRARY
+            NAMES fmpich2s
+            HINTS ${MPI_MPICH_PREFIX_PATHS})
+          find_library(MPI_fmpich2g_LIBRARY
+            NAMES fmpich2g
+            HINTS ${MPI_MPICH_PREFIX_PATHS})
+          mark_as_advanced(MPI_fmpich2_LIBRARY MPI_fmpich2s_LIBRARY MPI_fmpich2g_LIBRARY)
+          list(APPEND MPI_MPICH_LIB_NAMES "fmpich2")
+        endif()
+
+        if(NOT MPI_${LANG}_LIB_NAMES)
+          set(MPI_${LANG}_LIB_NAMES "${MPI_MPICH_LIB_NAMES}" CACHE STRING "MPI ${LANG} libraries to link against" FORCE)
+        endif()
+        unset(MPI_MPICH_LIB_NAMES)
+
+        if(NOT MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS)
+          # For MPICH2, the include folder would be in ../include relative to the library folder.
+          get_filename_component(MPI_MPICH_ROOT_DIR "${MPI_mpi_LIBRARY}" DIRECTORY)
+          get_filename_component(MPI_MPICH_ROOT_DIR "${MPI_MPICH_ROOT_DIR}" DIRECTORY)
+          if(IS_DIRECTORY "${MPI_MPICH_ROOT_DIR}/include")
+            set(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS "${MPI_MPICH_ROOT_DIR}/include" CACHE STRING "MPI ${LANG} additional include directory variables, given in the form MPI_<name>_INCLUDE_DIR." FORCE)
+          endif()
+          unset(MPI_MPICH_ROOT_DIR)
+        endif()
+        set(MPI_GUESS_FOUND TRUE)
+
+        if(_MPIEXEC_NOT_GIVEN)
+          unset(MPIEXEC_EXECUTABLE CACHE)
+        endif()
+
+        find_program(MPIEXEC_EXECUTABLE
+          NAMES ${_MPIEXEC_NAMES}
+          HINTS "$ENV{ProgramW6432}/MPICH2/bin"
+                "[HKEY_LOCAL_MACHINE\\SOFTWARE\\MPICH\\SMPD;binary]"
+                "[HKEY_LOCAL_MACHINE\\SOFTWARE\\MPICH2;Path]/bin"
+          DOC "Executable for running MPI programs.")
+      endif()
+      unset(MPI_MPICH_PREFIX_PATHS)
+    endif()
+  endif()
+  set(MPI_${LANG}_GUESS_FOUND "${MPI_GUESS_FOUND}" PARENT_SCOPE)
+endfunction()
+
+function(_MPI_adjust_compile_definitions LANG)
+  if("${LANG}" STREQUAL "CXX")
+    # To disable the C++ bindings, we need to pass some definitions since the mpi.h header has to deal with both C and C++
+    # bindings in MPI-2.
+    if(MPI_CXX_SKIP_MPICXX AND NOT MPI_${LANG}_COMPILE_DEFINITIONS MATCHES "SKIP_MPICXX")
+      # MPICH_SKIP_MPICXX is being used in MPICH and derivatives like MVAPICH or Intel MPI
+      # OMPI_SKIP_MPICXX is being used in Open MPI
+      # _MPICC_H is being used for IBM Platform MPI
+      list(APPEND MPI_${LANG}_COMPILE_DEFINITIONS "MPICH_SKIP_MPICXX" "OMPI_SKIP_MPICXX" "_MPICC_H")
+      set(MPI_${LANG}_COMPILE_DEFINITIONS "${MPI_${LANG}_COMPILE_DEFINITIONS}" CACHE STRING "MPI ${LANG} compilation definitions" FORCE)
+    endif()
+  endif()
+endfunction()
+
+macro(_MPI_assemble_libraries LANG)
+  set(MPI_${LANG}_LIBRARIES "")
+  # Only for libraries do we need to check whether the compiler's linking stage is separate.
+  if(NOT "${MPI_${LANG}_COMPILER}" STREQUAL "${CMAKE_${LANG}_COMPILER}" OR NOT MPI_${LANG}_WORKS_IMPLICIT)
+    foreach(mpilib IN LISTS MPI_${LANG}_LIB_NAMES)
+      list(APPEND MPI_${LANG}_LIBRARIES ${MPI_${mpilib}_LIBRARY})
+    endforeach()
+  endif()
+endmacro()
+
+macro(_MPI_assemble_include_dirs LANG)
+  if("${MPI_${LANG}_COMPILER}" STREQUAL "${CMAKE_${LANG}_COMPILER}")
+    set(MPI_${LANG}_INCLUDE_DIRS "")
+  else()
+    set(MPI_${LANG}_INCLUDE_DIRS "${MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS}")
+    if("${LANG}" MATCHES "(C|CXX)")
+      if(MPI_${LANG}_HEADER_DIR)
+        list(APPEND MPI_${LANG}_INCLUDE_DIRS "${MPI_${LANG}_HEADER_DIR}")
+      endif()
+    else() # Fortran
+      if(MPI_${LANG}_F77_HEADER_DIR)
+        list(APPEND MPI_${LANG}_INCLUDE_DIRS "${MPI_${LANG}_F77_HEADER_DIR}")
+      endif()
+      if(MPI_${LANG}_MODULE_DIR AND NOT "${MPI_${LANG}_MODULE_DIR}" IN_LIST MPI_${LANG}_INCLUDE_DIRS)
+        list(APPEND MPI_${LANG}_INCLUDE_DIRS "${MPI_${LANG}_MODULE_DIR}")
+      endif()
+    endif()
+    if(MPI_${LANG}_ADDITIONAL_INCLUDE_VARS)
+      foreach(MPI_ADDITIONAL_INC_DIR IN LISTS MPI_${LANG}_ADDITIONAL_INCLUDE_VARS)
+        list(APPEND MPI_${LANG}_INCLUDE_DIRS "${MPI_${MPI_ADDITIONAL_INC_DIR}_INCLUDE_DIR}")
+      endforeach()
+    endif()
+  endif()
+endmacro()
+
+function(_MPI_split_include_dirs LANG)
+  if("${MPI_${LANG}_COMPILER}" STREQUAL "${CMAKE_${LANG}_COMPILER}")
+    return()
+  endif()
+  # Backwards compatibility: Search INCLUDE_PATH if given.
+  if(MPI_${LANG}_INCLUDE_PATH)
+    list(APPEND MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS "${MPI_${LANG}_INCLUDE_PATH}")
+  endif()
+
+  # We try to find the headers/modules among those paths (and system paths)
+  # For C/C++, we just need to have a look for mpi.h.
+  if("${LANG}" MATCHES "(C|CXX)")
+    find_path(MPI_${LANG}_HEADER_DIR "mpi.h"
+      HINTS ${MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS}
+    )
+    mark_as_advanced(MPI_${LANG}_HEADER_DIR)
+    if(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS)
+      list(REMOVE_ITEM MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS "${MPI_${LANG}_HEADER_DIR}")
+    endif()
+  # Fortran is more complicated here: An implementation could provide
+  # any of the Fortran 77/90/2008 APIs for MPI. For example, MSMPI
+  # only provides Fortran 77 and - if mpi.f90 is built - potentially
+  # a Fortran 90 module.
+  elseif("${LANG}" STREQUAL "Fortran")
+    find_path(MPI_${LANG}_F77_HEADER_DIR "mpif.h"
+      HINTS ${MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS}
+    )
+    find_path(MPI_${LANG}_MODULE_DIR
+      NAMES "mpi.mod" "mpi_f08.mod"
+      HINTS ${MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS}
+    )
+    if(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS)
+      list(REMOVE_ITEM MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS
+        "${MPI_${LANG}_F77_HEADER_DIR}"
+        "${MPI_${LANG}_MODULE_DIR}"
+      )
+    endif()
+    mark_as_advanced(MPI_${LANG}_F77_HEADER_DIR MPI_${LANG}_MODULE_DIR)
+  endif()
+  # Remove duplicates and default system directories from the list.
+  if(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS)
+    list(REMOVE_DUPLICATES MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS)
+    foreach(MPI_IMPLICIT_INC_DIR IN LISTS CMAKE_${LANG}_IMPLICIT_LINK_DIRECTORIES)
+      list(REMOVE_ITEM MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS ${MPI_IMPLICIT_INC_DIR})
+    endforeach()
+  endif()
+  set(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS ${MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS} CACHE STRING "MPI ${LANG} additional include directories" FORCE)
+endfunction()
+
+macro(_MPI_create_imported_target LANG)
+  if(NOT TARGET MPI::MPI_${LANG})
+    add_library(MPI::MPI_${LANG} INTERFACE IMPORTED)
+  endif()
+
+  # When this is consumed for compiling CUDA, use '-Xcompiler' to wrap '-pthread'.
+  string(REPLACE "-pthread" "$<$<COMPILE_LANGUAGE:CUDA>:SHELL:-Xcompiler >-pthread"
+    _MPI_${LANG}_COMPILE_OPTIONS "${MPI_${LANG}_COMPILE_OPTIONS}")
+  set_property(TARGET MPI::MPI_${LANG} PROPERTY INTERFACE_COMPILE_OPTIONS "${_MPI_${LANG}_COMPILE_OPTIONS}")
+  unset(_MPI_${LANG}_COMPILE_OPTIONS)
+
+  set_property(TARGET MPI::MPI_${LANG} PROPERTY INTERFACE_COMPILE_DEFINITIONS "${MPI_${LANG}_COMPILE_DEFINITIONS}")
+
+  set_property(TARGET MPI::MPI_${LANG} PROPERTY INTERFACE_LINK_LIBRARIES "")
+  if(MPI_${LANG}_LINK_FLAGS)
+    separate_arguments(_MPI_${LANG}_LINK_FLAGS NATIVE_COMMAND "${MPI_${LANG}_LINK_FLAGS}")
+    set_property(TARGET MPI::MPI_${LANG} APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${_MPI_${LANG}_LINK_FLAGS}")
+    unset(_MPI_${LANG}_LINK_FLAGS)
+  endif()
+  # If the compiler links MPI implicitly, no libraries will be found as they're contained within
+  # CMAKE_<LANG>_IMPLICIT_LINK_LIBRARIES already.
+  if(MPI_${LANG}_LIBRARIES)
+    set_property(TARGET MPI::MPI_${LANG} APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${MPI_${LANG}_LIBRARIES}")
+  endif()
+  # Given the new design of FindMPI, INCLUDE_DIRS will always be located, even under implicit linking.
+  set_property(TARGET MPI::MPI_${LANG} PROPERTY INTERFACE_INCLUDE_DIRECTORIES "${MPI_${LANG}_INCLUDE_DIRS}")
+endmacro()
+
+function(_MPI_try_staged_settings LANG MPI_TEST_FILE_NAME MODE RUN_BINARY SUPPRESS_ERRORS)
+  set(WORK_DIR "${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/FindMPI")
+  set(SRC_DIR "${CMAKE_ROOT}/Modules/FindMPI")
+  set(BIN_FILE "${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/FindMPI/${MPI_TEST_FILE_NAME}_${LANG}.bin")
+  unset(MPI_TEST_COMPILE_DEFINITIONS)
+  if("${LANG}" STREQUAL "Fortran")
+    if("${MODE}" STREQUAL "F90_MODULE")
+      set(MPI_Fortran_INCLUDE_LINE "use mpi\n      implicit none")
+    elseif("${MODE}" STREQUAL "F08_MODULE")
+      set(MPI_Fortran_INCLUDE_LINE "use mpi_f08\n      implicit none")
+    else() # F77 header
+      set(MPI_Fortran_INCLUDE_LINE "implicit none\n      include 'mpif.h'")
+    endif()
+    configure_file("${SRC_DIR}/${MPI_TEST_FILE_NAME}.f90.in" "${WORK_DIR}/${MPI_TEST_FILE_NAME}.f90" @ONLY)
+    set(MPI_TEST_SOURCE_FILE "${WORK_DIR}/${MPI_TEST_FILE_NAME}.f90")
+  elseif("${LANG}" STREQUAL "CXX")
+    configure_file("${SRC_DIR}/${MPI_TEST_FILE_NAME}.c" "${WORK_DIR}/${MPI_TEST_FILE_NAME}.cpp" COPYONLY)
+    set(MPI_TEST_SOURCE_FILE "${WORK_DIR}/${MPI_TEST_FILE_NAME}.cpp")
+    if("${MODE}" STREQUAL "TEST_MPICXX")
+      set(MPI_TEST_COMPILE_DEFINITIONS TEST_MPI_MPICXX)
+    endif()
+  else() # C
+    set(MPI_TEST_SOURCE_FILE "${SRC_DIR}/${MPI_TEST_FILE_NAME}.c")
+  endif()
+  if(RUN_BINARY)
+    try_run(MPI_RUN_RESULT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE} MPI_RESULT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE}
+     "${CMAKE_BINARY_DIR}" SOURCES "${MPI_TEST_SOURCE_FILE}"
+      COMPILE_DEFINITIONS ${MPI_TEST_COMPILE_DEFINITIONS}
+      LINK_LIBRARIES MPI::MPI_${LANG}
+      RUN_OUTPUT_VARIABLE MPI_RUN_OUTPUT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE}
+      COMPILE_OUTPUT_VARIABLE _MPI_TRY_${MPI_TEST_FILE_NAME}_${MODE}_OUTPUT)
+    set(MPI_RUN_OUTPUT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE} "${MPI_RUN_OUTPUT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE}}" PARENT_SCOPE)
+  else()
+    try_compile(MPI_RESULT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE}
+      "${CMAKE_BINARY_DIR}" SOURCES "${MPI_TEST_SOURCE_FILE}"
+      COMPILE_DEFINITIONS ${MPI_TEST_COMPILE_DEFINITIONS}
+      LINK_LIBRARIES MPI::MPI_${LANG}
+      COPY_FILE "${BIN_FILE}"
+      OUTPUT_VARIABLE _MPI_TRY_${MPI_TEST_FILE_NAME}_${MODE}_OUTPUT)
+  endif()
+  if(NOT SUPPRESS_ERRORS)
+    if(NOT MPI_RESULT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE})
+      file(APPEND ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeError.log
+          "The MPI test ${MPI_TEST_FILE_NAME} for ${LANG} in mode ${MODE} failed to compile with the following output:\n${_MPI_TRY_${MPI_TEST_FILE_NAME}_${MODE}_OUTPUT}\n\n")
+    elseif(DEFINED MPI_RUN_RESULT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE} AND MPI_RUN_RESULT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE})
+        file(APPEND ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/CMakeError.log
+          "The MPI test ${MPI_TEST_FILE_NAME} for ${LANG} in mode ${MODE} failed to run with the following output:\n${MPI_RUN_OUTPUT_${LANG}_${MPI_TEST_FILE_NAME}_${MODE}}\n\n")
+    endif()
+  endif()
+endfunction()
+
+macro(_MPI_check_lang_works LANG SUPPRESS_ERRORS)
+  # For Fortran we may have by the MPI-3 standard an implementation that provides:
+  #   - the mpi_f08 module
+  #   - *both*, the mpi module and 'mpif.h'
+  # Since older MPI standards (MPI-1) did not define anything but 'mpif.h', we need to check all three individually.
+  if( NOT MPI_${LANG}_WORKS )
+    if("${LANG}" STREQUAL "Fortran")
+      set(MPI_Fortran_INTEGER_LINE "(kind=MPI_INTEGER_KIND)")
+      _MPI_try_staged_settings(${LANG} test_mpi F77_HEADER FALSE ${SUPPRESS_ERRORS})
+      _MPI_try_staged_settings(${LANG} test_mpi F90_MODULE FALSE ${SUPPRESS_ERRORS})
+      _MPI_try_staged_settings(${LANG} test_mpi F08_MODULE FALSE ${SUPPRESS_ERRORS})
+
+      set(MPI_${LANG}_WORKS FALSE)
+
+      foreach(mpimethod IN ITEMS F77_HEADER F08_MODULE F90_MODULE)
+        if(MPI_RESULT_${LANG}_test_mpi_${mpimethod})
+          set(MPI_${LANG}_WORKS TRUE)
+          set(MPI_${LANG}_HAVE_${mpimethod} TRUE)
+        else()
+          set(MPI_${LANG}_HAVE_${mpimethod} FALSE)
+        endif()
+      endforeach()
+      # MPI-1 versions had no MPI_INTGER_KIND defined, so we need to try without it.
+      # However, MPI-1 also did not define the Fortran 90 and 08 modules, so we only try the F77 header.
+      unset(MPI_Fortran_INTEGER_LINE)
+      if(NOT MPI_${LANG}_WORKS)
+        _MPI_try_staged_settings(${LANG} test_mpi F77_HEADER_NOKIND FALSE ${SUPPRESS_ERRORS})
+        if(MPI_RESULT_${LANG}_test_mpi_F77_HEADER_NOKIND)
+          set(MPI_${LANG}_WORKS TRUE)
+          set(MPI_${LANG}_HAVE_F77_HEADER TRUE)
+        endif()
+      endif()
+    else()
+      _MPI_try_staged_settings(${LANG} test_mpi normal FALSE ${SUPPRESS_ERRORS})
+      # If 'test_mpi' built correctly, we've found valid MPI settings. There might not be MPI-2 C++ support, but there can't
+      # be MPI-2 C++ support without the C bindings being present, so checking for them is sufficient.
+      set(MPI_${LANG}_WORKS "${MPI_RESULT_${LANG}_test_mpi_normal}")
+    endif()
+  endif()
+endmacro()
+
+# Some systems install various MPI implementations in separate folders in some MPI prefix
+# This macro enumerates all such subfolders and adds them to the list of hints that will be searched.
+macro(MPI_search_mpi_prefix_folder PREFIX_FOLDER)
+  if(EXISTS "${PREFIX_FOLDER}")
+    file(GLOB _MPI_folder_children RELATIVE "${PREFIX_FOLDER}" "${PREFIX_FOLDER}/*")
+    foreach(_MPI_folder_child IN LISTS _MPI_folder_children)
+      if(IS_DIRECTORY "${PREFIX_FOLDER}/${_MPI_folder_child}")
+        list(APPEND MPI_HINT_DIRS "${PREFIX_FOLDER}/${_MPI_folder_child}")
+      endif()
+    endforeach()
+  endif()
+endmacro()
+
+set(MPI_HINT_DIRS ${MPI_HOME} $ENV{MPI_HOME} $ENV{I_MPI_ROOT})
+if("${CMAKE_HOST_SYSTEM_NAME}" STREQUAL "Linux")
+  # SUSE Linux Enterprise Server stores its MPI implementations under /usr/lib64/mpi/gcc/<name>
+  # We enumerate the subfolders and append each as a prefix
+  MPI_search_mpi_prefix_folder("/usr/lib64/mpi/gcc")
+elseif("${CMAKE_HOST_SYSTEM_NAME}" STREQUAL "FreeBSD")
+  # FreeBSD ships mpich under the normal system paths - but available openmpi implementations
+  # will be found in /usr/local/mpi/<name>
+  MPI_search_mpi_prefix_folder("/usr/local/mpi")
+endif()
+
+# Most MPI distributions have some form of mpiexec or mpirun which gives us something we can look for.
+# The MPI standard does not mandate the existence of either, but instead only makes requirements if a distribution
+# ships an mpiexec program (mpirun executables are not regulated by the standard).
+
+# We defer searching for mpiexec binaries belonging to guesses until later. By doing so, mismatches between mpiexec
+# and the MPI we found should be reduced.
+if(NOT MPIEXEC_EXECUTABLE)
+  set(_MPIEXEC_NOT_GIVEN TRUE)
+else()
+  set(_MPIEXEC_NOT_GIVEN FALSE)
+endif()
+
+find_program(MPIEXEC_EXECUTABLE
+  NAMES ${_MPIEXEC_NAMES}
+  PATH_SUFFIXES bin sbin
+  HINTS ${MPI_HINT_DIRS}
+  DOC "Executable for running MPI programs.")
+
+# call get_filename_component twice to remove mpiexec and the directory it exists in (typically bin).
+# This gives us a fairly reliable base directory to search for /bin /lib and /include from.
+get_filename_component(_MPI_BASE_DIR "${MPIEXEC_EXECUTABLE}" PATH)
+get_filename_component(_MPI_BASE_DIR "${_MPI_BASE_DIR}" PATH)
+
+# According to the MPI standard, section 8.8 -n is a guaranteed, and the only guaranteed way to
+# launch an MPI process using mpiexec if such a program exists.
+set(MPIEXEC_NUMPROC_FLAG "-n"  CACHE STRING "Flag used by MPI to specify the number of processes for mpiexec; the next option will be the number of processes.")
+set(MPIEXEC_PREFLAGS     ""    CACHE STRING "These flags will be directly before the executable that is being run by mpiexec.")
+set(MPIEXEC_POSTFLAGS    ""    CACHE STRING "These flags will be placed after all flags passed to mpiexec.")
+
+# Set the number of processes to the physical processor count
+cmake_host_system_information(RESULT _MPIEXEC_NUMPROCS QUERY NUMBER_OF_PHYSICAL_CORES)
+set(MPIEXEC_MAX_NUMPROCS "${_MPIEXEC_NUMPROCS}" CACHE STRING "Maximum number of processors available to run MPI applications.")
+unset(_MPIEXEC_NUMPROCS)
+mark_as_advanced(MPIEXEC_EXECUTABLE MPIEXEC_NUMPROC_FLAG MPIEXEC_PREFLAGS MPIEXEC_POSTFLAGS MPIEXEC_MAX_NUMPROCS)
+
+#=============================================================================
+# Backward compatibility input hacks.  Propagate the FindMPI hints to C and
+# CXX if the respective new versions are not defined.  Translate the old
+# MPI_LIBRARY and MPI_EXTRA_LIBRARY to respective MPI_${LANG}_LIBRARIES.
+#
+# Once we find the new variables, we translate them back into their old
+# equivalents below.
+if(NOT MPI_IGNORE_LEGACY_VARIABLES)
+  foreach (LANG IN ITEMS C CXX)
+    # Old input variables.
+    set(_MPI_OLD_INPUT_VARS COMPILER COMPILE_FLAGS INCLUDE_PATH LINK_FLAGS)
+
+    # Set new vars based on their old equivalents, if the new versions are not already set.
+    foreach (var ${_MPI_OLD_INPUT_VARS})
+      if (NOT MPI_${LANG}_${var} AND MPI_${var})
+        set(MPI_${LANG}_${var} "${MPI_${var}}")
+      endif()
+    endforeach()
+
+    # Chop the old compile flags into options and definitions
+
+    unset(MPI_${LANG}_EXTRA_COMPILE_DEFINITIONS)
+    unset(MPI_${LANG}_EXTRA_COMPILE_OPTIONS)
+    if(MPI_${LANG}_COMPILE_FLAGS)
+      separate_arguments(MPI_SEPARATE_FLAGS NATIVE_COMMAND "${MPI_${LANG}_COMPILE_FLAGS}")
+      foreach(_MPI_FLAG IN LISTS MPI_SEPARATE_FLAGS)
+        if("${_MPI_FLAG}" MATCHES "^ *-D([^ ]+)")
+          list(APPEND MPI_${LANG}_EXTRA_COMPILE_DEFINITIONS "${CMAKE_MATCH_1}")
+        else()
+          list(APPEND MPI_${LANG}_EXTRA_COMPILE_OPTIONS "${_MPI_FLAG}")
+        endif()
+      endforeach()
+      unset(MPI_SEPARATE_FLAGS)
+    endif()
+
+    # If a list of libraries was given, we'll split it into new-style cache variables
+    unset(MPI_${LANG}_EXTRA_LIB_NAMES)
+    if(NOT MPI_${LANG}_LIB_NAMES)
+      foreach(_MPI_LIB IN LISTS MPI_${LANG}_LIBRARIES MPI_LIBRARY MPI_EXTRA_LIBRARY)
+        if(_MPI_LIB)
+          get_filename_component(_MPI_PLAIN_LIB_NAME "${_MPI_LIB}" NAME_WE)
+          get_filename_component(_MPI_LIB_NAME "${_MPI_LIB}" NAME)
+          get_filename_component(_MPI_LIB_DIR "${_MPI_LIB}" DIRECTORY)
+          list(APPEND MPI_${LANG}_EXTRA_LIB_NAMES "${_MPI_PLAIN_LIB_NAME}")
+          find_library(MPI_${_MPI_PLAIN_LIB_NAME}_LIBRARY
+            NAMES "${_MPI_LIB_NAME}" "lib${_MPI_LIB_NAME}"
+            HINTS ${_MPI_LIB_DIR} $ENV{MPI_LIB}
+            DOC "Location of the ${_MPI_PLAIN_LIB_NAME} library for MPI"
+          )
+          mark_as_advanced(MPI_${_MPI_PLAIN_LIB_NAME}_LIBRARY)
+        endif()
+      endforeach()
+    endif()
+  endforeach()
+endif()
+#=============================================================================
+
+unset(MPI_VERSION)
+unset(MPI_VERSION_MAJOR)
+unset(MPI_VERSION_MINOR)
+
+unset(_MPI_MIN_VERSION)
+
+# If the user specified a library name we assume they prefer that library over a wrapper. If not, they can disable skipping manually.
+if(NOT DEFINED MPI_SKIP_COMPILER_WRAPPER AND MPI_GUESS_LIBRARY_NAME)
+  set(MPI_SKIP_COMPILER_WRAPPER TRUE)
+endif()
+
+# This loop finds the compilers and sends them off for interrogation.
+foreach(LANG IN ITEMS C CXX Fortran)
+  if(CMAKE_${LANG}_COMPILER_LOADED)
+    if(NOT MPI_FIND_COMPONENTS)
+      set(_MPI_FIND_${LANG} TRUE)
+    elseif( ${LANG} IN_LIST MPI_FIND_COMPONENTS)
+      set(_MPI_FIND_${LANG} TRUE)
+    elseif( ${LANG} STREQUAL CXX AND NOT MPI_CXX_SKIP_MPICXX AND MPICXX IN_LIST MPI_FIND_COMPONENTS )
+      set(_MPI_FIND_${LANG} TRUE)
+    else()
+      set(_MPI_FIND_${LANG} FALSE)
+    endif()
+  else()
+    set(_MPI_FIND_${LANG} FALSE)
+  endif()
+  if(_MPI_FIND_${LANG})
+    if( ${LANG} STREQUAL CXX AND NOT MPICXX IN_LIST MPI_FIND_COMPONENTS )
+      set(MPI_CXX_SKIP_MPICXX FALSE CACHE BOOL "If true, the MPI-2 C++ bindings are disabled using definitions.")
+      mark_as_advanced(MPI_CXX_SKIP_MPICXX)
+    endif()
+    if(NOT (MPI_${LANG}_LIB_NAMES AND (MPI_${LANG}_INCLUDE_PATH OR MPI_${LANG}_INCLUDE_DIRS OR MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS)))
+      set(MPI_${LANG}_TRIED_IMPLICIT FALSE)
+      set(MPI_${LANG}_WORKS_IMPLICIT FALSE)
+      if(NOT MPI_${LANG}_COMPILER AND NOT MPI_ASSUME_NO_BUILTIN_MPI)
+        # Should the imported targets be empty, we effectively try whether the compiler supports MPI on its own, which is the case on e.g.
+        # Cray PrgEnv.
+        _MPI_create_imported_target(${LANG})
+        _MPI_check_lang_works(${LANG} TRUE)
+
+        # If the compiler can build MPI code on its own, it functions as an MPI compiler and we'll set the variable to point to it.
+        if(MPI_${LANG}_WORKS)
+          set(MPI_${LANG}_COMPILER "${CMAKE_${LANG}_COMPILER}" CACHE FILEPATH "MPI compiler for ${LANG}" FORCE)
+          set(MPI_${LANG}_WORKS_IMPLICIT TRUE)
+        endif()
+        set(MPI_${LANG}_TRIED_IMPLICIT TRUE)
+      endif()
+
+      if(NOT "${MPI_${LANG}_COMPILER}" STREQUAL "${CMAKE_${LANG}_COMPILER}" OR NOT MPI_${LANG}_WORKS)
+        set(MPI_${LANG}_WRAPPER_FOUND FALSE)
+        set(MPI_PINNED_COMPILER FALSE)
+
+        if(NOT MPI_SKIP_COMPILER_WRAPPER)
+          if(MPI_${LANG}_COMPILER)
+            # If the user supplies a compiler *name* instead of an absolute path, assume that we need to find THAT compiler.
+            if (NOT IS_ABSOLUTE "${MPI_${LANG}_COMPILER}")
+              # Get rid of our default list of names and just search for the name the user wants.
+              set(_MPI_${LANG}_COMPILER_NAMES "${MPI_${LANG}_COMPILER}")
+              unset(MPI_${LANG}_COMPILER CACHE)
+            endif()
+            # If the user specifies a compiler, we don't want to try to search libraries either.
+            set(MPI_PINNED_COMPILER TRUE)
+          endif()
+
+          # If we have an MPI base directory, we'll try all compiler names in that one first.
+          # This should prevent mixing different MPI environments
+          if(_MPI_BASE_DIR)
+            find_program(MPI_${LANG}_COMPILER
+              NAMES  ${_MPI_${LANG}_COMPILER_NAMES}
+              PATH_SUFFIXES bin sbin
+              HINTS  ${_MPI_BASE_DIR}
+              NO_DEFAULT_PATH
+              DOC    "MPI compiler for ${LANG}"
+            )
+          endif()
+
+          # If the base directory did not help (for example because the mpiexec isn't in the same directory as the compilers),
+          # we shall try searching in the default paths.
+          find_program(MPI_${LANG}_COMPILER
+            NAMES  ${_MPI_${LANG}_COMPILER_NAMES}
+            PATH_SUFFIXES bin sbin
+            DOC    "MPI compiler for ${LANG}"
+          )
+
+          if("${MPI_${LANG}_COMPILER}" STREQUAL "${CMAKE_${LANG}_COMPILER}")
+            set(MPI_PINNED_COMPILER TRUE)
+
+            # If we haven't made the implicit compiler test yet, perform it now.
+            if(NOT MPI_${LANG}_TRIED_IMPLICIT)
+              _MPI_create_imported_target(${LANG})
+              _MPI_check_lang_works(${LANG} TRUE)
+            endif()
+
+            # Should the MPI compiler not work implicitly for MPI, still interrogate it.
+            # Otherwise, MPI compilers for which CMake has separate linking stages, e.g. Intel MPI on Windows where link.exe is being used
+            # directly during linkage instead of CMAKE_<LANG>_COMPILER will not work.
+            if(NOT MPI_${LANG}_WORKS)
+              set(MPI_${LANG}_WORKS_IMPLICIT FALSE)
+              _MPI_interrogate_compiler(${LANG})
+            else()
+              set(MPI_${LANG}_WORKS_IMPLICIT TRUE)
+            endif()
+          elseif(MPI_${LANG}_COMPILER)
+            _MPI_interrogate_compiler(${LANG})
+          endif()
+        endif()
+
+        if(NOT MPI_PINNED_COMPILER AND NOT MPI_${LANG}_WRAPPER_FOUND)
+          # If MPI_PINNED_COMPILER wasn't given, and the MPI compiler we potentially found didn't work, we withdraw it.
+          set(MPI_${LANG}_COMPILER "MPI_${LANG}_COMPILER-NOTFOUND" CACHE FILEPATH "MPI compiler for ${LANG}" FORCE)
+          if(NOT MPI_SKIP_GUESSING)
+            # For C++, we may use the settings for C. Should a given compiler wrapper for C++ not exist, but one for C does, we copy over the
+            # settings for C. An MPI distribution that is in this situation would be IBM Platform MPI.
+            if("${LANG}" STREQUAL "CXX" AND MPI_C_WRAPPER_FOUND)
+              set(MPI_${LANG}_COMPILE_OPTIONS          ${MPI_C_COMPILE_OPTIONS}     CACHE STRING "MPI ${LANG} compilation options"           )
+              set(MPI_${LANG}_COMPILE_DEFINITIONS      ${MPI_C_COMPILE_DEFINITIONS} CACHE STRING "MPI ${LANG} compilation definitions"       )
+              set(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS  ${MPI_C_INCLUDE_DIRS}        CACHE STRING "MPI ${LANG} additional include directories")
+              set(MPI_${LANG}_LINK_FLAGS               ${MPI_C_LINK_FLAGS}          CACHE STRING "MPI ${LANG} linker flags"                  )
+              set(MPI_${LANG}_LIB_NAMES                ${MPI_C_LIB_NAMES}           CACHE STRING "MPI ${LANG} libraries to link against"     )
+            else()
+              _MPI_guess_settings(${LANG})
+            endif()
+          endif()
+        endif()
+      endif()
+    endif()
+
+    _MPI_split_include_dirs(${LANG})
+    _MPI_assemble_include_dirs(${LANG})
+    _MPI_assemble_libraries(${LANG})
+
+    _MPI_adjust_compile_definitions(${LANG})
+    # We always create imported targets even if they're empty
+    _MPI_create_imported_target(${LANG})
+
+    if(NOT MPI_${LANG}_WORKS)
+      _MPI_check_lang_works(${LANG} FALSE)
+    endif()
+
+    # Next, we'll initialize the MPI variables that have not been previously set.
+    set(MPI_${LANG}_COMPILE_OPTIONS          "" CACHE STRING "MPI ${LANG} compilation flags"             )
+    set(MPI_${LANG}_COMPILE_DEFINITIONS      "" CACHE STRING "MPI ${LANG} compilation definitions"       )
+    set(MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS  "" CACHE STRING "MPI ${LANG} additional include directories")
+    set(MPI_${LANG}_LINK_FLAGS               "" CACHE STRING "MPI ${LANG} linker flags"                  )
+    if(NOT MPI_${LANG}_COMPILER STREQUAL CMAKE_${LANG}_COMPILER)
+      set(MPI_${LANG}_LIB_NAMES                "" CACHE STRING "MPI ${LANG} libraries to link against"   )
+    endif()
+    mark_as_advanced(MPI_${LANG}_COMPILE_OPTIONS MPI_${LANG}_COMPILE_DEFINITIONS MPI_${LANG}_LINK_FLAGS
+      MPI_${LANG}_LIB_NAMES MPI_${LANG}_ADDITIONAL_INCLUDE_DIRS MPI_${LANG}_COMPILER)
+
+    # If we've found MPI, then we'll perform additional analysis: Determine the MPI version, MPI library version, supported
+    # MPI APIs (i.e. MPI-2 C++ bindings). For Fortran we also need to find specific parameters if we're under MPI-3.
+    if(MPI_${LANG}_WORKS)
+      if("${LANG}" STREQUAL "CXX" AND NOT DEFINED MPI_MPICXX_FOUND)
+        if(NOT MPI_CXX_SKIP_MPICXX AND NOT MPI_CXX_VALIDATE_SKIP_MPICXX)
+          _MPI_try_staged_settings(${LANG} test_mpi MPICXX FALSE FALSE)
+          if(MPI_RESULT_${LANG}_test_mpi_MPICXX)
+            set(MPI_MPICXX_FOUND TRUE)
+          else()
+            set(MPI_MPICXX_FOUND FALSE)
+          endif()
+        else()
+          set(MPI_MPICXX_FOUND FALSE)
+        endif()
+      endif()
+
+      # At this point, we know the bindings present but not the MPI version or anything else.
+      if(NOT DEFINED MPI_${LANG}_VERSION)
+        unset(MPI_${LANG}_VERSION_MAJOR)
+        unset(MPI_${LANG}_VERSION_MINOR)
+      endif()
+      set(MPI_BIN_FOLDER ${CMAKE_BINARY_DIR}${CMAKE_FILES_DIRECTORY}/FindMPI)
+
+      # For Fortran, we'll want to use the most modern MPI binding to test capabilities other than the
+      # Fortran parameters, since those depend on the method of consumption.
+      # For C++, we can always use the C bindings, and should do so, since the C++ bindings do not exist in MPI-3
+      # whereas the C bindings do, and the C++ bindings never offered any feature advantage over their C counterparts.
+      if("${LANG}" STREQUAL "Fortran")
+        if(MPI_${LANG}_HAVE_F08_MODULE)
+          set(MPI_${LANG}_HIGHEST_METHOD F08_MODULE)
+        elseif(MPI_${LANG}_HAVE_F90_MODULE)
+          set(MPI_${LANG}_HIGHEST_METHOD F90_MODULE)
+        else()
+          set(MPI_${LANG}_HIGHEST_METHOD F77_HEADER)
+        endif()
+
+        # Another difference between C and Fortran is that we can't use the preprocessor to determine whether MPI_VERSION
+        # and MPI_SUBVERSION are provided. These defines did not exist in MPI 1.0 and 1.1 and therefore might not
+        # exist. For C/C++, test_mpi.c will handle the MPI_VERSION extraction, but for Fortran, we need mpiver.f90.
+        if(NOT DEFINED MPI_${LANG}_VERSION)
+          _MPI_try_staged_settings(${LANG} mpiver ${MPI_${LANG}_HIGHEST_METHOD} FALSE FALSE)
+          if(MPI_RESULT_${LANG}_mpiver_${MPI_${LANG}_HIGHEST_METHOD})
+            file(STRINGS ${MPI_BIN_FOLDER}/mpiver_${LANG}.bin _MPI_VERSION_STRING LIMIT_COUNT 1 REGEX "INFO:MPI-VER")
+            if("${_MPI_VERSION_STRING}" MATCHES ".*INFO:MPI-VER\\[([0-9]+)\\.([0-9]+)\\].*")
+              set(MPI_${LANG}_VERSION_MAJOR "${CMAKE_MATCH_1}")
+              set(MPI_${LANG}_VERSION_MINOR "${CMAKE_MATCH_2}")
+              set(MPI_${LANG}_VERSION "${MPI_${LANG}_VERSION_MAJOR}.${MPI_${LANG}_VERSION_MINOR}")
+            endif()
+          endif()
+        endif()
+
+        # Finally, we want to find out which capabilities a given interface supports, compare the MPI-3 standard.
+        # This is determined by interface specific parameters MPI_SUBARRAYS_SUPPORTED and MPI_ASYNC_PROTECTS_NONBLOCKING
+        # and might vary between the different methods of consumption.
+        if(MPI_DETERMINE_Fortran_CAPABILITIES AND NOT MPI_Fortran_CAPABILITIES_DETERMINED)
+          foreach(mpimethod IN ITEMS F08_MODULE F90_MODULE F77_HEADER)
+            if(MPI_${LANG}_HAVE_${mpimethod})
+              set(MPI_${LANG}_${mpimethod}_SUBARRAYS FALSE)
+              set(MPI_${LANG}_${mpimethod}_ASYNCPROT FALSE)
+              _MPI_try_staged_settings(${LANG} fortranparam_mpi ${mpimethod} TRUE FALSE)
+              if(MPI_RESULT_${LANG}_fortranparam_mpi_${mpimethod} AND
+                NOT "${MPI_RUN_RESULT_${LANG}_fortranparam_mpi_${mpimethod}}" STREQUAL "FAILED_TO_RUN")
+                if("${MPI_RUN_OUTPUT_${LANG}_fortranparam_mpi_${mpimethod}}" MATCHES
+                  ".*INFO:SUBARRAYS\\[ *([TF]) *\\]-ASYNCPROT\\[ *([TF]) *\\].*")
+                  if("${CMAKE_MATCH_1}" STREQUAL "T")
+                    set(MPI_${LANG}_${mpimethod}_SUBARRAYS TRUE)
+                  endif()
+                  if("${CMAKE_MATCH_2}" STREQUAL "T")
+                    set(MPI_${LANG}_${mpimethod}_ASYNCPROT TRUE)
+                  endif()
+                endif()
+              endif()
+            endif()
+          endforeach()
+          set(MPI_Fortran_CAPABILITIES_DETERMINED TRUE)
+        endif()
+      else()
+        set(MPI_${LANG}_HIGHEST_METHOD normal)
+
+        # By the MPI-2 standard, MPI_VERSION and MPI_SUBVERSION are valid for both C and C++ bindings.
+        if(NOT DEFINED MPI_${LANG}_VERSION)
+          file(STRINGS ${MPI_BIN_FOLDER}/test_mpi_${LANG}.bin _MPI_VERSION_STRING LIMIT_COUNT 1 REGEX "INFO:MPI-VER")
+          if("${_MPI_VERSION_STRING}" MATCHES ".*INFO:MPI-VER\\[([0-9]+)\\.([0-9]+)\\].*")
+            set(MPI_${LANG}_VERSION_MAJOR "${CMAKE_MATCH_1}")
+            set(MPI_${LANG}_VERSION_MINOR "${CMAKE_MATCH_2}")
+            set(MPI_${LANG}_VERSION "${MPI_${LANG}_VERSION_MAJOR}.${MPI_${LANG}_VERSION_MINOR}")
+          endif()
+        endif()
+      endif()
+
+      unset(MPI_BIN_FOLDER)
+
+      # At this point, we have dealt with determining the MPI version and parameters for each Fortran method available.
+      # The one remaining issue is to determine which MPI library is installed.
+      # Determining the version and vendor of the MPI library is only possible via MPI_Get_library_version() at runtime,
+      # and therefore we cannot do this while cross-compiling (a user may still define MPI_<lang>_LIBRARY_VERSION_STRING
+      # themselves and we'll attempt splitting it, which is equivalent to provide the try_run output).
+      # It's also worth noting that the installed version string can depend on the language, or on the system the binary
+      # runs on if MPI is not statically linked.
+      if(MPI_DETERMINE_LIBRARY_VERSION AND NOT MPI_${LANG}_LIBRARY_VERSION_STRING)
+        _MPI_try_staged_settings(${LANG} libver_mpi ${MPI_${LANG}_HIGHEST_METHOD} TRUE FALSE)
+        if(MPI_RESULT_${LANG}_libver_mpi_${MPI_${LANG}_HIGHEST_METHOD} AND
+          "${MPI_RUN_RESULT_${LANG}_libver_mpi_${MPI_${LANG}_HIGHEST_METHOD}}" EQUAL "0")
+          string(STRIP "${MPI_RUN_OUTPUT_${LANG}_libver_mpi_${MPI_${LANG}_HIGHEST_METHOD}}"
+            MPI_${LANG}_LIBRARY_VERSION_STRING)
+        else()
+          set(MPI_${LANG}_LIBRARY_VERSION_STRING "NOTFOUND")
+        endif()
+      endif()
+    endif()
+
+    set(MPI_${LANG}_FIND_QUIETLY ${MPI_FIND_QUIETLY})
+    set(MPI_${LANG}_FIND_VERSION ${MPI_FIND_VERSION})
+    set(MPI_${LANG}_FIND_VERSION_EXACT ${MPI_FIND_VERSION_EXACT})
+
+    unset(MPI_${LANG}_REQUIRED_VARS)
+    if (NOT "${MPI_${LANG}_COMPILER}" STREQUAL "${CMAKE_${LANG}_COMPILER}")
+      foreach(mpilibname IN LISTS MPI_${LANG}_LIB_NAMES)
+        list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${mpilibname}_LIBRARY")
+      endforeach()
+      list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${LANG}_LIB_NAMES")
+      if("${LANG}" STREQUAL "Fortran")
+        # For Fortran we only need one of the module or header directories to have *some* support for MPI.
+        if(NOT MPI_${LANG}_MODULE_DIR)
+          list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${LANG}_F77_HEADER_DIR")
+        endif()
+        if(NOT MPI_${LANG}_F77_HEADER_DIR)
+          list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${LANG}_MODULE_DIR")
+        endif()
+      else()
+        list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${LANG}_HEADER_DIR")
+      endif()
+      if(MPI_${LANG}_ADDITIONAL_INCLUDE_VARS)
+        foreach(mpiincvar IN LISTS MPI_${LANG}_ADDITIONAL_INCLUDE_VARS)
+          list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${mpiincvar}_INCLUDE_DIR")
+        endforeach()
+      endif()
+      # Append the works variable now. If the settings did not work, this will show up properly.
+      list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${LANG}_WORKS")
+    else()
+      # If the compiler worked implicitly, use its path as output.
+      # Should the compiler variable be set, we also require it to work.
+      list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${LANG}_COMPILER")
+      if(MPI_${LANG}_COMPILER)
+        list(APPEND MPI_${LANG}_REQUIRED_VARS "MPI_${LANG}_WORKS")
+      endif()
+    endif()
+    find_package_handle_standard_args(MPI_${LANG} REQUIRED_VARS ${MPI_${LANG}_REQUIRED_VARS}
+      VERSION_VAR MPI_${LANG}_VERSION)
+
+    if(DEFINED MPI_${LANG}_VERSION)
+      if(NOT _MPI_MIN_VERSION OR _MPI_MIN_VERSION VERSION_GREATER MPI_${LANG}_VERSION)
+        set(_MPI_MIN_VERSION MPI_${LANG}_VERSION)
+      endif()
+    endif()
+  endif()
+endforeach()
+
+unset(_MPI_REQ_VARS)
+foreach(LANG IN ITEMS C CXX Fortran)
+  if((NOT MPI_FIND_COMPONENTS AND CMAKE_${LANG}_COMPILER_LOADED) OR LANG IN_LIST MPI_FIND_COMPONENTS)
+    list(APPEND _MPI_REQ_VARS "MPI_${LANG}_FOUND")
+  endif()
+endforeach()
+
+if(MPICXX IN_LIST MPI_FIND_COMPONENTS)
+  list(APPEND _MPI_REQ_VARS "MPI_MPICXX_FOUND")
+endif()
+
+find_package_handle_standard_args(MPI
+    REQUIRED_VARS ${_MPI_REQ_VARS}
+    VERSION_VAR ${_MPI_MIN_VERSION}
+    HANDLE_COMPONENTS)
+
+#=============================================================================
+# More backward compatibility stuff
+
+# For compatibility reasons, we also define MPIEXEC
+set(MPIEXEC "${MPIEXEC_EXECUTABLE}")
+
+# Copy over MPI_<LANG>_INCLUDE_PATH from the assembled INCLUDE_DIRS.
+foreach(LANG IN ITEMS C CXX Fortran)
+  if(MPI_${LANG}_FOUND)
+    set(MPI_${LANG}_INCLUDE_PATH "${MPI_${LANG}_INCLUDE_DIRS}")
+    unset(MPI_${LANG}_COMPILE_FLAGS)
+    if(MPI_${LANG}_COMPILE_OPTIONS)
+      list(JOIN MPI_${LANG}_COMPILE_FLAGS " " MPI_${LANG}_COMPILE_OPTIONS)
+    endif()
+    if(MPI_${LANG}_COMPILE_DEFINITIONS)
+      foreach(_MPI_DEF IN LISTS MPI_${LANG}_COMPILE_DEFINITIONS)
+        string(APPEND MPI_${LANG}_COMPILE_FLAGS " -D${_MPI_DEF}")
+      endforeach()
+    endif()
+  endif()
+endforeach()
+
+# Bare MPI sans ${LANG} vars are set to CXX then C, depending on what was found.
+# This mimics the behavior of the old language-oblivious FindMPI.
+set(_MPI_OLD_VARS COMPILER INCLUDE_PATH COMPILE_FLAGS LINK_FLAGS LIBRARIES)
+if (MPI_CXX_FOUND)
+  foreach (var ${_MPI_OLD_VARS})
+    set(MPI_${var} ${MPI_CXX_${var}})
+  endforeach()
+elseif (MPI_C_FOUND)
+  foreach (var ${_MPI_OLD_VARS})
+    set(MPI_${var} ${MPI_C_${var}})
+  endforeach()
+endif()
+
+# Chop MPI_LIBRARIES into the old-style MPI_LIBRARY and MPI_EXTRA_LIBRARY, and set them in cache.
+if (MPI_LIBRARIES)
+  list(GET MPI_LIBRARIES 0 MPI_LIBRARY_WORK)
+  set(MPI_LIBRARY "${MPI_LIBRARY_WORK}")
+  unset(MPI_LIBRARY_WORK)
+else()
+  set(MPI_LIBRARY "MPI_LIBRARY-NOTFOUND")
+endif()
+
+list(LENGTH MPI_LIBRARIES MPI_NUMLIBS)
+if (MPI_NUMLIBS GREATER 1)
+  set(MPI_EXTRA_LIBRARY_WORK "${MPI_LIBRARIES}")
+  list(REMOVE_AT MPI_EXTRA_LIBRARY_WORK 0)
+  set(MPI_EXTRA_LIBRARY "${MPI_EXTRA_LIBRARY_WORK}")
+  unset(MPI_EXTRA_LIBRARY_WORK)
+else()
+  set(MPI_EXTRA_LIBRARY "MPI_EXTRA_LIBRARY-NOTFOUND")
+endif()
+set(MPI_IGNORE_LEGACY_VARIABLES TRUE)
+#=============================================================================
+
+# unset these vars to cleanup namespace
+unset(_MPI_OLD_VARS)
+unset(_MPI_PREFIX_PATH)
+unset(_MPI_BASE_DIR)
+foreach (lang C CXX Fortran)
+  unset(_MPI_${LANG}_COMPILER_NAMES)
+endforeach()
+
+cmake_policy(POP)
diff --git a/CMake/VTKmMPI.cmake b/CMake/VTKmMPI.cmake
new file mode 100644
index 000000000..f44789441
--- /dev/null
+++ b/CMake/VTKmMPI.cmake
@@ -0,0 +1,21 @@
+##============================================================================
+##  Copyright (c) Kitware, Inc.
+##  All rights reserved.
+##  See LICENSE.txt for details.
+##
+##  This software is distributed WITHOUT ANY WARRANTY; without even
+##  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+##  PURPOSE.  See the above copyright notice for more information.
+##============================================================================
+
+if(VTKm_ENABLE_MPI AND NOT TARGET MPI::MPI_CXX)
+  if(CMAKE_VERSION VERSION_LESS 3.10)
+    find_package(MPI REQUIRED MODULE)
+  else()
+    #clunky but we need to make sure we use the upstream module if it exists
+    set(orig_CMAKE_MODULE_PATH ${CMAKE_MODULE_PATH})
+    set(CMAKE_MODULE_PATH "")
+    find_package(MPI MODULE)
+    set(CMAKE_MODULE_PATH ${orig_CMAKE_MODULE_PATH})
+  endif()
+endif()
diff --git a/CMake/VTKmWrappers.cmake b/CMake/VTKmWrappers.cmake
index 51e7259aa..205682138 100644
--- a/CMake/VTKmWrappers.cmake
+++ b/CMake/VTKmWrappers.cmake
@@ -12,6 +12,7 @@ include(CMakeParseArguments)
 
 include(VTKmDeviceAdapters)
 include(VTKmCPUVectorization)
+include(VTKmMPI)
 
 #-----------------------------------------------------------------------------
 # Utility to build a kit name from the current directory.
diff --git a/CMakeLists.txt b/CMakeLists.txt
index 6e57a7f18..f098a84f0 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -182,14 +182,6 @@ check_type_size("long long" VTKm_SIZE_LONG_LONG BUILTIN_TYPES_ONLY)
 
 #-----------------------------------------------------------------------------
 # Add subdirectories
-if(VTKm_ENABLE_MPI)
-  # This `if` is temporary and will be removed once `diy` supports building
-  # without MPI.
-  if (NOT MPI_C_FOUND)
-    find_package(MPI ${VTKm_FIND_PACKAGE_QUIETLY})
-  endif()
-endif()
-
 add_subdirectory(vtkm)
 
 #-----------------------------------------------------------------------------
@@ -241,6 +233,7 @@ if(NOT VTKm_INSTALL_ONLY_LIBRARIES)
   install(
     FILES
       ${VTKm_SOURCE_DIR}/CMake/FindTBB.cmake
+      ${VTKm_SOURCE_DIR}/CMake/FindMPI.cmake
       ${VTKm_SOURCE_DIR}/CMake/FindOpenGL.cmake
       ${VTKm_SOURCE_DIR}/CMake/FindOpenMP.cmake
     DESTINATION ${VTKm_INSTALL_CMAKE_MODULE_DIR}
@@ -253,6 +246,7 @@ if(NOT VTKm_INSTALL_ONLY_LIBRARIES)
       ${VTKm_SOURCE_DIR}/CMake/VTKmDetectCUDAVersion.cu
       ${VTKm_SOURCE_DIR}/CMake/VTKmDeviceAdapters.cmake
       ${VTKm_SOURCE_DIR}/CMake/VTKmExportHeaderTemplate.h.in
+      ${VTKm_SOURCE_DIR}/CMake/VTKmMPI.cmake
       ${VTKm_SOURCE_DIR}/CMake/VTKmRenderingContexts.cmake
       ${VTKm_SOURCE_DIR}/CMake/VTKmWrappers.cmake
     DESTINATION ${VTKm_INSTALL_CMAKE_MODULE_DIR}
diff --git a/vtkm/thirdparty/diy/CMakeLists.txt b/vtkm/thirdparty/diy/CMakeLists.txt
index 30133cf26..af6303fe2 100644
--- a/vtkm/thirdparty/diy/CMakeLists.txt
+++ b/vtkm/thirdparty/diy/CMakeLists.txt
@@ -24,23 +24,7 @@ target_include_directories(vtkm_diy INTERFACE
   $<INSTALL_INTERFACE:${VTKm_INSTALL_INCLUDE_DIR}/vtkm/thirdparty/diy>)
 
 if(VTKm_ENABLE_MPI)
-  set(arg)
-  foreach(apath IN LISTS MPI_C_INCLUDE_PATH MPI_CXX_INCLUDE_PATH)
-    list(APPEND arg $<BUILD_INTERFACE:${apath}>)
-  endforeach()
-  list(REMOVE_DUPLICATES arg)
-  target_include_directories(vtkm_diy INTERFACE ${arg})
-  target_link_libraries(vtkm_diy INTERFACE
-    $<BUILD_INTERFACE:${MPI_C_LIBRARIES}>
-    $<BUILD_INTERFACE:${MPI_CXX_LIBRARIES}>)
-  if(MPI_C_COMPILE_DEFINITIONS)
-    target_compile_definitions(vtkm_diy INTERFACE
-      $<$<COMPILE_LANGUAGE:C>:${MPI_C_COMPILE_DEFINITIONS}>)
-  endif()
-  if(MPI_CXX_COMPILE_DEFNITIONS)
-    target_compile_definitions(vtkm_diy INTERFACE
-      $<$<COMPILE_LANGUAGE:CXX>:${MPI_CXX_COMPILE_DEFNITIONS>)
-  endif()
+  target_link_libraries(vtkm_diy INTERFACE MPI::MPI_CXX)
 endif()
 
 install(TARGETS vtkm_diy

From 74d713c77405320783a0781a3ba0bd003a5d61ea Mon Sep 17 00:00:00 2001
From: Robert Maynard <robert.maynard@kitware.com>
Date: Mon, 24 Jun 2019 12:59:53 -0400
Subject: [PATCH 2/8] Install compilation tests are enabled when examples are
 disabled

Previously VTK-m only activated the install/compilation tests when
examples had been enabled. This decreased the amount of coverage
on dashboards.
---
 CMakeLists.txt          |  4 +---
 examples/CMakeLists.txt | 42 ++++++++++++++++++++---------------------
 2 files changed, 22 insertions(+), 24 deletions(-)

diff --git a/CMakeLists.txt b/CMakeLists.txt
index f098a84f0..93ea215f4 100644
--- a/CMakeLists.txt
+++ b/CMakeLists.txt
@@ -309,6 +309,4 @@ endif()
 
 #-----------------------------------------------------------------------------
 # Build examples
-if(VTKm_ENABLE_EXAMPLES)
-  add_subdirectory(examples)
-endif(VTKm_ENABLE_EXAMPLES)
+add_subdirectory(examples)
diff --git a/examples/CMakeLists.txt b/examples/CMakeLists.txt
index 7721d5523..501cb6ea8 100644
--- a/examples/CMakeLists.txt
+++ b/examples/CMakeLists.txt
@@ -10,28 +10,28 @@
 
 #add the directory that contains the VTK-m config file to the cmake
 #path so that our examples can find VTK-m
-set(CMAKE_PREFIX_PATH ${VTKm_BINARY_DIR}/${VTKm_INSTALL_CONFIG_DIR})
-
-add_subdirectory(clipping)
-add_subdirectory(contour_tree)
-add_subdirectory(contour_tree_augmented)
-add_subdirectory(cosmotools)
-add_subdirectory(demo)
-add_subdirectory(game_of_life)
-add_subdirectory(hello_world)
-add_subdirectory(histogram)
-add_subdirectory(isosurface)
-add_subdirectory(lagrangian)
-add_subdirectory(multi_backend)
-add_subdirectory(oscillator)
-add_subdirectory(particle_advection)
-add_subdirectory(redistribute_points)
-add_subdirectory(rendering)
-add_subdirectory(streamline)
-add_subdirectory(temporal_advection)
-add_subdirectory(tetrahedra)
-# add_subdirectory(unified_memory)
 
+if(VTKm_ENABLE_EXAMPLES)
+  set(CMAKE_PREFIX_PATH ${VTKm_BINARY_DIR}/${VTKm_INSTALL_CONFIG_DIR})
+  add_subdirectory(clipping)
+  add_subdirectory(contour_tree)
+  add_subdirectory(contour_tree_augmented)
+  add_subdirectory(cosmotools)
+  add_subdirectory(demo)
+  add_subdirectory(game_of_life)
+  add_subdirectory(hello_world)
+  add_subdirectory(histogram)
+  add_subdirectory(isosurface)
+  add_subdirectory(lagrangian)
+  add_subdirectory(multi_backend)
+  add_subdirectory(oscillator)
+  add_subdirectory(particle_advection)
+  add_subdirectory(redistribute_points)
+  add_subdirectory(rendering)
+  add_subdirectory(streamline)
+  add_subdirectory(temporal_advection)
+  add_subdirectory(tetrahedra)
+endif()
 
 if (VTKm_ENABLE_TESTING)
   # These need to be fast to build as they will

From 8f1589c96fab8a15ae03bd94e7d5976aa6124b96 Mon Sep 17 00:00:00 2001
From: Robert Maynard <robert.maynard@kitware.com>
Date: Mon, 24 Jun 2019 13:18:56 -0400
Subject: [PATCH 3/8] Correct license on FindMPI.cmake

---
 CMake/FindMPI.cmake | 11 +++++++++--
 LICENSE.txt         |  1 -
 2 files changed, 9 insertions(+), 3 deletions(-)

diff --git a/CMake/FindMPI.cmake b/CMake/FindMPI.cmake
index 3c7fe377c..66f2d3e85 100644
--- a/CMake/FindMPI.cmake
+++ b/CMake/FindMPI.cmake
@@ -1,5 +1,12 @@
-# Distributed under the OSI-approved BSD 3-Clause License.  See accompanying
-# file Copyright.txt or https://cmake.org/licensing for details.
+##============================================================================
+##  Copyright (c) Kitware, Inc.
+##  All rights reserved.
+##  See LICENSE.txt for details.
+##
+##  This software is distributed WITHOUT ANY WARRANTY; without even
+##  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+##  PURPOSE.  See the above copyright notice for more information.
+##============================================================================
 
 #[=======================================================================[.rst:
 FindMPI
diff --git a/LICENSE.txt b/LICENSE.txt
index 6bce0b3e7..7a88e22b0 100644
--- a/LICENSE.txt
+++ b/LICENSE.txt
@@ -49,7 +49,6 @@ contents of these for details on the specifics of their respective
 licenses.
 - - - - - - - - - - - - - - - - - - - - - - - - do not remove this line
 CMake/FindTBB.cmake
-CMake/FindGLEW.cmake
 Utilities
 vtkm/cont/tbb/internal/parallel_sort.h
 vtkm/cont/tbb/internal/parallel_radix_sort_tbb.h

From 118583dea56d1a16fb9baad721f86296d40f036d Mon Sep 17 00:00:00 2001
From: Robert Maynard <robert.maynard@kitware.com>
Date: Mon, 24 Jun 2019 13:36:29 -0400
Subject: [PATCH 4/8] Test compilations against installed VTK-m work with CUDA
 enabled

---
 CMake/testing/VTKmTestInstall.cmake | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/CMake/testing/VTKmTestInstall.cmake b/CMake/testing/VTKmTestInstall.cmake
index b73df8b54..d1299c851 100644
--- a/CMake/testing/VTKmTestInstall.cmake
+++ b/CMake/testing/VTKmTestInstall.cmake
@@ -62,13 +62,12 @@ file(GENERATE
   OUTPUT "${${file_loc_var}}"
   CONTENT
 "
-set(CMAKE_BUILD_TYPE ${CMAKE_BUILD_TYPE} CACHE STRING \"\")
-set(CMAKE_PREFIX_PATH ${install_prefix} CACHE STRING \"\")
-set(CMAKE_CXX_COMPILER ${CMAKE_CXX_COMPILER} CACHE FILEPATH \"\")
-set(CMAKE_CXX_FLAGS ${CMAKE_CXX_FLAGS} CACHE STRING \"\")
-set(CMAKE_CUDA_COMPILER ${CMAKE_CUDA_COMPILER} CACHE FILEPATH \"\")
-set(CMAKE_CUDA_FLAGS ${CMAKE_CUDA_FLAGS} CACHE STRING \"\")
-set(CMAKE_CUDA_HOST_COMPILER ${CMAKE_CUDA_HOST_COMPILER} CACHE FILEPATH \"\")
+set(CMAKE_PREFIX_PATH \"${install_prefix}/\" CACHE STRING \"\")
+set(CMAKE_CXX_COMPILER \"${CMAKE_CXX_COMPILER}\" CACHE FILEPATH \"\")
+set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS}\" CACHE STRING \"\")
+set(CMAKE_CUDA_COMPILER \"${CMAKE_CUDA_COMPILER}\" CACHE FILEPATH \"\")
+set(CMAKE_CUDA_FLAGS \"${CMAKE_CUDA_FLAGS}\" CACHE STRING \"\")
+set(CMAKE_CUDA_HOST_COMPILER \"${CMAKE_CUDA_HOST_COMPILER}\" CACHE FILEPATH \"\")
 "
 )
 
@@ -93,10 +92,11 @@ function(vtkm_test_against_install dir)
 
   add_test(NAME ${build_name}
            COMMAND ${CMAKE_CTEST_COMMAND}
+           -C $<CONFIG>
            --build-and-test ${src_dir} ${build_dir}
            --build-generator ${CMAKE_GENERATOR}
            --build-makeprogram ${CMAKE_MAKE_PROGRAM}
-           --build-options -C "${build_config}"
+           --build-options "-C" "${build_config}"
            )
 
   set_tests_properties(${build_name} PROPERTIES LABELS ${test_label} )

From 86df1d27beaf54596c6d8cd6d571d628426e2300 Mon Sep 17 00:00:00 2001
From: Robert Maynard <robert.maynard@kitware.com>
Date: Mon, 24 Jun 2019 16:58:31 -0400
Subject: [PATCH 5/8] Update VTKmMPI to handle CMake 3.13+

---
 CMake/FindMPI.cmake | 6 +++++-
 CMake/VTKmMPI.cmake | 5 ++++-
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/CMake/FindMPI.cmake b/CMake/FindMPI.cmake
index 66f2d3e85..734510e53 100644
--- a/CMake/FindMPI.cmake
+++ b/CMake/FindMPI.cmake
@@ -1152,7 +1152,11 @@ macro(_MPI_create_imported_target LANG)
   set_property(TARGET MPI::MPI_${LANG} PROPERTY INTERFACE_LINK_LIBRARIES "")
   if(MPI_${LANG}_LINK_FLAGS)
     separate_arguments(_MPI_${LANG}_LINK_FLAGS NATIVE_COMMAND "${MPI_${LANG}_LINK_FLAGS}")
-    set_property(TARGET MPI::MPI_${LANG} APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${_MPI_${LANG}_LINK_FLAGS}")
+    if(CMAKE_VERSION VERSION_LESS 3.13)
+      set_property(TARGET MPI::MPI_${LANG} APPEND PROPERTY INTERFACE_LINK_LIBRARIES "${_MPI_${LANG}_LINK_FLAGS}")
+    else()
+      set_property(TARGET MPI::MPI_${LANG} APPEND PROPERTY INTERFACE_LINK_OPTIONS "${_MPI_${LANG}_LINK_FLAGS}")
+    endif()
     unset(_MPI_${LANG}_LINK_FLAGS)
   endif()
   # If the compiler links MPI implicitly, no libraries will be found as they're contained within
diff --git a/CMake/VTKmMPI.cmake b/CMake/VTKmMPI.cmake
index f44789441..778c86c2d 100644
--- a/CMake/VTKmMPI.cmake
+++ b/CMake/VTKmMPI.cmake
@@ -9,7 +9,10 @@
 ##============================================================================
 
 if(VTKm_ENABLE_MPI AND NOT TARGET MPI::MPI_CXX)
-  if(CMAKE_VERSION VERSION_LESS 3.10)
+  if(CMAKE_VERSION VERSION_LESS 3.15)
+    #While CMake 3.10 introduced the new MPI module.
+    #Fixes related to MPI+CUDA that VTK-m needs are
+    #only found in CMake 3.15+.
     find_package(MPI REQUIRED MODULE)
   else()
     #clunky but we need to make sure we use the upstream module if it exists

From bbb3912268fdffdef2bd5f81a3f3edc35295748e Mon Sep 17 00:00:00 2001
From: Robert Maynard <robert.maynard@kitware.com>
Date: Mon, 24 Jun 2019 16:58:54 -0400
Subject: [PATCH 6/8] VTKmTestInstall properly propagates compiler flags

---
 CMake/testing/VTKmTestInstall.cmake | 41 ++++++++++++++++++++++++-----
 1 file changed, 35 insertions(+), 6 deletions(-)

diff --git a/CMake/testing/VTKmTestInstall.cmake b/CMake/testing/VTKmTestInstall.cmake
index d1299c851..a96baabe7 100644
--- a/CMake/testing/VTKmTestInstall.cmake
+++ b/CMake/testing/VTKmTestInstall.cmake
@@ -62,11 +62,12 @@ file(GENERATE
   OUTPUT "${${file_loc_var}}"
   CONTENT
 "
-set(CMAKE_PREFIX_PATH \"${install_prefix}/\" CACHE STRING \"\")
+set(CMAKE_MAKE_PROGRAM \"${CMAKE_MAKE_PROGRAM}\" CACHE FILEPATH \"\")
+set(CMAKE_PREFIX_PATH \"${CMAKE_PREFIX_PATH};${install_prefix}/\" CACHE STRING \"\")
 set(CMAKE_CXX_COMPILER \"${CMAKE_CXX_COMPILER}\" CACHE FILEPATH \"\")
-set(CMAKE_CXX_FLAGS \"${CMAKE_CXX_FLAGS}\" CACHE STRING \"\")
+set(CMAKE_CXX_FLAGS \"$CACHE{CMAKE_CXX_FLAGS}\" CACHE STRING \"\")
 set(CMAKE_CUDA_COMPILER \"${CMAKE_CUDA_COMPILER}\" CACHE FILEPATH \"\")
-set(CMAKE_CUDA_FLAGS \"${CMAKE_CUDA_FLAGS}\" CACHE STRING \"\")
+set(CMAKE_CUDA_FLAGS \"$CACHE{CMAKE_CUDA_FLAGS}\" CACHE STRING \"\")
 set(CMAKE_CUDA_HOST_COMPILER \"${CMAKE_CUDA_HOST_COMPILER}\" CACHE FILEPATH \"\")
 "
 )
@@ -80,8 +81,34 @@ function(vtkm_test_against_install dir)
   set(src_dir "${CMAKE_CURRENT_SOURCE_DIR}/${name}/")
   set(build_dir "${VTKm_BINARY_DIR}/CMakeFiles/_tmp_build/test_${name}/")
 
-  set(build_config "${build_dir}/build_options.cmake")
-  vtkm_generate_install_build_options(build_config)
+  set(args )
+  if(CMAKE_VERSION VERSION_LESS 3.13)
+    #Before 3.13 the config file passing to cmake via ctest --build-options
+    #was broken
+    set(args
+      -DCMAKE_MAKE_PROGRAM:FILEPATH=${CMAKE_MAKE_PROGRAM}
+      -DCMAKE_PREFIX_PATH:STRING=${install_prefix}
+      -DCMAKE_CXX_COMPILER:FILEPATH=${CMAKE_CXX_COMPILER}
+      -DCMAKE_CUDA_COMPILER:FILEPATH=${CMAKE_CUDA_COMPILER}
+      -DCMAKE_CUDA_HOST_COMPILER:FILEPATH=${CMAKE_CUDA_HOST_COMPILER}
+      -DCMAKE_CXX_FLAGS:STRING=$CACHE{CMAKE_CXX_FLAGS}
+      -DCMAKE_CUDA_FLAGS:STRING=$CACHE{CMAKE_CUDA_FLAGS}
+    )
+  else()
+    set(build_config "${build_dir}build_options.cmake")
+    vtkm_generate_install_build_options(build_config)
+    set(args -C ${build_config})
+  endif()
+
+  if(WIN32 AND TARGET vtkm::tbb)
+    #on windows we need to specify these as FindTBB won't
+    #find the installed version just with the prefix path
+    list(APPEND args
+      -DTBB_LIBRARY_DEBUG:FILEPATH=${TBB_LIBRARY_DEBUG}
+      -DTBB_LIBRARY_RELEASE:FILEPATH=${TBB_LIBRARY_RELEASE}
+      -DTBB_INCLUDE_DIR:PATH=${TBB_INCLUDE_DIR}
+    )
+  endif()
 
   #determine if the test is expected to compile or fail to build. We use
   #this information to built the test name to make it clear to the user
@@ -96,7 +123,9 @@ function(vtkm_test_against_install dir)
            --build-and-test ${src_dir} ${build_dir}
            --build-generator ${CMAKE_GENERATOR}
            --build-makeprogram ${CMAKE_MAKE_PROGRAM}
-           --build-options "-C" "${build_config}"
+           --build-options
+            ${args}
+            --no-warn-unused-cli
            )
 
   set_tests_properties(${build_name} PROPERTIES LABELS ${test_label} )

From 774d7a566eb3063d257e0c7d924636aaa72916cb Mon Sep 17 00:00:00 2001
From: Robert Maynard <robert.maynard@kitware.com>
Date: Wed, 26 Jun 2019 12:13:47 -0400
Subject: [PATCH 7/8] Add release notes for v1.4.0

---
 docs/changelog/1.4/release-notes.md           | 1307 +++++++++++++++++
 ...Base-StealArray-returns-delete-function.md |   24 -
 .../changelog/Variant_AsVirtual_force_cast.md |    6 -
 .../add-cuda-kernel-details-to-logging.md     |   12 -
 docs/changelog/add-vtkm_filter-target.md      |    4 -
 docs/changelog/array-virtual-not-special.md   |   11 -
 docs/changelog/arrayhandlevirtual.md          |   45 -
 ...zip-handles-writing-to-implicit-handles.md |    9 -
 .../asynchronize-device-independent-timer.md  |   65 -
 docs/changelog/bitfields.md                   |   51 -
 ...ounding-interval-hierarchy-in-vtkm-cont.md |   10 -
 .../case-insensitive-device-from-string.md    |   14 -
 docs/changelog/cast-variant-to-storage.md     |   58 -
 docs/changelog/cmake-38-required.md           |   10 -
 docs/changelog/connected-components.md        |    9 -
 ...ocator-disable-managed-memory-from-code.md |    6 -
 .../cuda-separable-compilation-enabled.md     |    4 -
 docs/changelog/field-tags-no-template.md      |  132 --
 docs/changelog/improve-cuda-scheduling.md     |   45 -
 docs/changelog/initialize.md                  |   20 -
 .../invoker-supports-scatter-types.md         |   21 -
 docs/changelog/lodepng.md                     |    5 -
 docs/changelog/mask-worklets.md               |  104 --
 docs/changelog/merge-benchmark-executables.md |    6 -
 .../merge-rendering-testing-executables.md    |    3 -
 .../merge-worklet-testing-executables.md      |    8 -
 docs/changelog/optionparser-to-third-party.md |   13 -
 .../parse-some-options-in-initialize.md       |   91 --
 docs/changelog/point-merge.md                 |   26 -
 .../portal-value-reference-operators.md       |   12 -
 .../redesign-runtime-device-tracking.md       |   90 --
 ...on-support-differing-input-output-types.md |   53 -
 ...ename-per-thread-runtime-tracker-method.md |    9 -
 .../specialize-worklet-for-device.md          |  147 --
 .../update-CellLocatorTwoLevelUniformGrid.md  |   31 -
 docs/changelog/update-optional-parser.md      |    3 -
 docs/changelog/variantarrayhandle.md          |   43 -
 docs/changelog/verify-cmake-install.md        |    9 -
 .../vtkm-handles-busy-cuda-devices-better.md  |   17 -
 docs/changelog/vtkm-mangle-diy.md             |   11 -
 40 files changed, 1307 insertions(+), 1237 deletions(-)
 create mode 100644 docs/changelog/1.4/release-notes.md
 delete mode 100644 docs/changelog/StorageBase-StealArray-returns-delete-function.md
 delete mode 100644 docs/changelog/Variant_AsVirtual_force_cast.md
 delete mode 100644 docs/changelog/add-cuda-kernel-details-to-logging.md
 delete mode 100644 docs/changelog/add-vtkm_filter-target.md
 delete mode 100644 docs/changelog/array-virtual-not-special.md
 delete mode 100644 docs/changelog/arrayhandlevirtual.md
 delete mode 100644 docs/changelog/arrayhandlezip-handles-writing-to-implicit-handles.md
 delete mode 100644 docs/changelog/asynchronize-device-independent-timer.md
 delete mode 100644 docs/changelog/bitfields.md
 delete mode 100644 docs/changelog/bounding-interval-hierarchy-in-vtkm-cont.md
 delete mode 100644 docs/changelog/case-insensitive-device-from-string.md
 delete mode 100644 docs/changelog/cast-variant-to-storage.md
 delete mode 100644 docs/changelog/cmake-38-required.md
 delete mode 100644 docs/changelog/connected-components.md
 delete mode 100644 docs/changelog/cuda-allocator-disable-managed-memory-from-code.md
 delete mode 100644 docs/changelog/cuda-separable-compilation-enabled.md
 delete mode 100644 docs/changelog/field-tags-no-template.md
 delete mode 100644 docs/changelog/improve-cuda-scheduling.md
 delete mode 100644 docs/changelog/initialize.md
 delete mode 100644 docs/changelog/invoker-supports-scatter-types.md
 delete mode 100644 docs/changelog/lodepng.md
 delete mode 100644 docs/changelog/mask-worklets.md
 delete mode 100644 docs/changelog/merge-benchmark-executables.md
 delete mode 100644 docs/changelog/merge-rendering-testing-executables.md
 delete mode 100644 docs/changelog/merge-worklet-testing-executables.md
 delete mode 100644 docs/changelog/optionparser-to-third-party.md
 delete mode 100644 docs/changelog/parse-some-options-in-initialize.md
 delete mode 100644 docs/changelog/point-merge.md
 delete mode 100644 docs/changelog/portal-value-reference-operators.md
 delete mode 100644 docs/changelog/redesign-runtime-device-tracking.md
 delete mode 100644 docs/changelog/reduction-support-differing-input-output-types.md
 delete mode 100644 docs/changelog/rename-per-thread-runtime-tracker-method.md
 delete mode 100644 docs/changelog/specialize-worklet-for-device.md
 delete mode 100644 docs/changelog/update-CellLocatorTwoLevelUniformGrid.md
 delete mode 100644 docs/changelog/update-optional-parser.md
 delete mode 100644 docs/changelog/variantarrayhandle.md
 delete mode 100644 docs/changelog/verify-cmake-install.md
 delete mode 100644 docs/changelog/vtkm-handles-busy-cuda-devices-better.md
 delete mode 100644 docs/changelog/vtkm-mangle-diy.md

diff --git a/docs/changelog/1.4/release-notes.md b/docs/changelog/1.4/release-notes.md
new file mode 100644
index 000000000..9b270790b
--- /dev/null
+++ b/docs/changelog/1.4/release-notes.md
@@ -0,0 +1,1307 @@
+VTK-m 1.4 Release Notes
+=======================
+
+# Table of Contents
+1. [Core](#Core)
+    - Remove templates from `ControlSignature` field tags
+    - Worklets can now be specialized for a specific device adapter
+    - Worklets now support an execution mask
+    - Redesign VTK-m Runtime Device Tracking
+    - `vtkm::cont::Initialize` added to make setting up VTK-m runtime state easier
+2. [ArrayHandle](#ArrayHandle)
+    - Add `vtkm::cont::ArrayHandleVirtual`
+    - `vtkm::cont::ArrayHandleZip` provides a consistent API even with non-writable handles
+    - `vtkm::cont::VariantArrayHandle` replaces `vtkm::cont::DynamicArrayHandle`
+    - `vtkm::cont::VariantArrayHandle` CastAndCall supports casting to concrete types
+    - `vtkm::cont::VariantArrayHandle::AsVirtual<T>()` performs casting
+    - `StorageBasic::StealArray()` now provides delete function to new owner
+3. [Control Environment](#Control-Environment)
+    - `vtkm::cont::CellLocatorGeneral` has been added
+    - `vtkm::cont::CellLocatorTwoLevelUniformGrid` has been renamed to `vtkm::cont::CellLocatorUniformBins`
+    - `vtkm::cont::Timer` now supports  asynchronous and device independent timers
+    - `vtkm::cont::DeviceAdapterId` construction from strings are now case-insensitive
+    - `vtkm::cont::Initialize` will only parse known arguments
+4. [Execution Environment](#Execution-Environment)
+    - VTK-m logs details about each CUDA kernel launch
+    - VTK-m CUDA allocations can have managed memory (cudaMallocManaged) enabled/disabled from C++
+    - VTK-m CUDA kernel scheduling improved including better defaults, and user customization support
+    - VTK-m Reduction algorithm now supports differing input and output types
+    - Added specialized operators for ArrayPortalValueReference
+5. [Worklets and Filters](#Worklets-and-Filters)
+    - `vtkm::worklet::Invoker` now supports worklets which require a Scatter object
+    - `BitFields` are now a support field input/out type for VTK-m worklets
+    - Added a Point Merging worklet
+    - `vtkm::filter::CleanGrid` now can do point merging
+    - Added a connected component worklets and filters
+6. [Build](#Build)
+    - CMake 3.8+ now required to build VTK-m
+    - VTK-m now can verify that it installs itself correctly
+    - VTK-m now requires `CUDA` separable compilation to build
+    - VTK-m provides a `vtkm_filter` CMake target
+    - `vtkm::cont::CellLocatorBoundingIntervalHierarchy` is compiled into `vtkm_cont`
+7. [Other](#Other)
+    - LodePNG added as a thirdparty package
+    - Optionparser added as a thirdparty package
+    - Thirdparty diy now can coexist with external diy
+    - Merge benchmark executables into a device dependent shared library
+    - Merge rendering testing executables to a shared library
+    - Merge worklet testing executables into a device dependent shared library
+    - VTK-m runtime device detection properly handles busy CUDA devices
+
+# Core
+
+## Remove templates from `ControlSignature` field tags
+
+Previously, several of the `ControlSignature` tags had a template to
+specify a type list. This was to specify potential valid value types for an
+input array. The importance of this typelist was to limit the number of
+code paths created when resolving a `vtkm::cont::VariantArrayHandle`
+(formerly a `DynamicArrayHandle`). This (potentially) reduced the compile
+time, the size of libraries/executables, and errors from unexpected types.
+
+Much has changed since this feature was originally implemented. Since then,
+the filter infrastructure has been created, and it is through this that
+most dynamic worklet invocations happen. However, since the filter
+infrastrcture does its own type resolution (and has its own policies) the
+type arguments in `ControlSignature` are now of little value.
+
+### Script to update code
+
+This update requires changes to just about all code implementing a VTK-m
+worklet. To facilitate the update of this code to these new changes (not to
+mention all the code in VTK-m) a script is provided to automatically remove
+these template parameters from VTK-m code.
+
+This script is at
+[Utilities/Scripts/update-control-signature-tags.sh](../../Utilities/Scripts/update-control-signature-tags.sh).
+It needs to be run in a Unix-compatible shell. It takes a single argument,
+which is a top level directory to modify files. The script processes all C++
+source files recursively from that directory.
+
+### Selecting data types for auxiliary filter fields
+
+The main rational for making these changes is that the types of the inputs
+to worklets is almost always already determined by the calling filter.
+However, although it is straightforward to specify the type of the "main"
+(active) scalars in a filter, it is less clear what to do for additional
+fields if a filter needs a second or third field.
+
+Typically, in the case of a second or third field, it is up to the
+`DoExecute` method in the filter implementation to apply a policy to that
+field. When applying a policy, you give it a policy object (nominally
+passed by the user) and a traits of the filter. Generally, the accepted
+list of types for a field should be part of the filter's traits. For
+example, consider the `WarpVector` filter. This filter only works on
+`Vec`s of size 3, so its traits class looks like this.
+
+``` cpp
+template <>
+class FilterTraits<WarpVector>
+{
+public:
+  // WarpVector can only applies to Float and Double Vec3 arrays
+  using InputFieldTypeList = vtkm::TypeListTagFieldVec3;
+};
+```
+
+However, the `WarpVector` filter also requires two fields instead of one.
+The first (active) field is handled by its superclass (`FilterField`), but
+the second (auxiliary) field must be managed in the `DoExecute`. Generally,
+this can be done by simply applying the policy with the filter traits.
+
+### The corner cases
+
+Most of the calls to worklets happen within filter implementations, which
+have their own way of narrowing down potential types (as previously
+described). The majority of the remainder either use static types or work
+with a variety of types.
+
+However, there is a minority of corner cases that require a reduction of
+types. Since the type argument of the worklet `ControlSignature` arguments
+are no longer available, the narrowing of types must be done before the
+call to `Invoke`.
+
+This narrowing of arguments is not particularly difficult. Such type-unsure
+arguments usually come from a `VariantArrayHandle` (or something that uses
+one). You can select the types from a `VariantArrayHandle` simply by using
+the `ResetTypes` method. For example, say you know that a variant array is
+supposed to be a scalar.
+
+``` cpp
+dispatcher.Invoke(variantArray.ResetTypes(vtkm::TypeListTagFieldScalar()),
+                  staticArray);
+```
+
+Even more common is to have a `vtkm::cont::Field` object. A `Field` object
+internally holds a `VariantArrayHandle`, which is accessible via the
+`GetData` method.
+
+``` cpp
+dispatcher.Invoke(field.GetData().ResetTypes(vtkm::TypeListTagFieldScalar()),
+                  staticArray);
+```
+
+### Change in executable size
+
+The whole intention of these template parameters in the first place was to
+reduce the number of code paths compiled. The hypothesis of this change was
+that in the current structure the code paths were not being reduced much
+if at all. If that is true, the size of executables and libraries should
+not change.
+
+Here is a recording of the library and executable sizes before this change
+(using `ds -h`).
+
+```
+3.0M    libvtkm_cont-1.2.1.dylib
+6.2M    libvtkm_rendering-1.2.1.dylib
+312K    Rendering_SERIAL
+312K    Rendering_TBB
+ 22M    Worklets_SERIAL
+ 23M    Worklets_TBB
+ 22M    UnitTests_vtkm_filter_testing
+5.7M    UnitTests_vtkm_cont_serial_testing
+6.0M    UnitTests_vtkm_cont_tbb_testing
+7.1M    UnitTests_vtkm_cont_testing
+```
+
+After the changes, the executable sizes are as follows.
+
+```
+3.0M    libvtkm_cont-1.2.1.dylib
+6.0M    libvtkm_rendering-1.2.1.dylib
+312K    Rendering_SERIAL
+312K    Rendering_TBB
+ 21M    Worklets_SERIAL
+ 21M    Worklets_TBB
+ 22M    UnitTests_vtkm_filter_testing
+5.6M    UnitTests_vtkm_cont_serial_testing
+6.0M    UnitTests_vtkm_cont_tbb_testing
+7.1M    UnitTests_vtkm_cont_testing
+```
+
+As we can see, the built sizes have not changed significantly. (If
+anything, the build is a little smaller.)
+
+
+## Worklets can now be specialized for a specific device adapter
+
+This change adds an execution signature tag named `Device` that passes
+a `DeviceAdapterTag` to the worklet's parenthesis operator. This allows the
+worklet to specialize its operation. This features is available in all
+worklets.
+
+The following example shows a worklet that specializes itself for the CUDA
+device.
+
+```cpp
+struct DeviceSpecificWorklet : vtkm::worklet::WorkletMapField
+{
+  using ControlSignature = void(FieldIn, FieldOut);
+  using ExecutionSignature = _2(_1, Device);
+
+  // Specialization for the Cuda device.
+  template <typename T>
+  T operator()(T x, vtkm::cont::DeviceAdapterTagCuda) const
+  {
+    // Special cuda implementation
+  }
+
+  // General implementation
+  template <typename T, typename Device>
+  T operator()(T x, Device) const
+  {
+    // General implementation
+  }
+};
+```
+
+### Effect on compile time and binary size
+
+This change necessitated adding a template parameter for the device that
+followed at least from the schedule all the way down. This has the
+potential for duplicating several of the support methods (like
+`DoWorkletInvokeFunctor`) that would otherwise have the same type. This is
+especially true between the devices that run on the CPU as they should all
+be sharing the same portals from `ArrayHandle`s. So the question is whether
+it causes compile to take longer or cause a significant increase in
+binaries.
+
+To informally test, I first ran a clean debug compile on my Windows machine
+with the serial and tbb devices. The build itself took **3 minutes, 50
+seconds**. Here is a list of the binary sizes in the bin directory:
+
+```
+kmorel2 0> du -sh *.exe *.dll
+200K    BenchmarkArrayTransfer_SERIAL.exe
+204K    BenchmarkArrayTransfer_TBB.exe
+424K    BenchmarkAtomicArray_SERIAL.exe
+424K    BenchmarkAtomicArray_TBB.exe
+440K    BenchmarkCopySpeeds_SERIAL.exe
+580K    BenchmarkCopySpeeds_TBB.exe
+4.1M    BenchmarkDeviceAdapter_SERIAL.exe
+5.3M    BenchmarkDeviceAdapter_TBB.exe
+7.9M    BenchmarkFieldAlgorithms_SERIAL.exe
+7.9M    BenchmarkFieldAlgorithms_TBB.exe
+22M     BenchmarkFilters_SERIAL.exe
+22M     BenchmarkFilters_TBB.exe
+276K    BenchmarkRayTracing_SERIAL.exe
+276K    BenchmarkRayTracing_TBB.exe
+4.4M    BenchmarkTopologyAlgorithms_SERIAL.exe
+4.4M    BenchmarkTopologyAlgorithms_TBB.exe
+712K    Rendering_SERIAL.exe
+712K    Rendering_TBB.exe
+708K    UnitTests_vtkm_cont_arg_testing.exe
+1.7M    UnitTests_vtkm_cont_internal_testing.exe
+13M     UnitTests_vtkm_cont_serial_testing.exe
+14M     UnitTests_vtkm_cont_tbb_testing.exe
+18M     UnitTests_vtkm_cont_testing.exe
+13M     UnitTests_vtkm_cont_testing_mpi.exe
+736K    UnitTests_vtkm_exec_arg_testing.exe
+136K    UnitTests_vtkm_exec_internal_testing.exe
+196K    UnitTests_vtkm_exec_serial_internal_testing.exe
+196K    UnitTests_vtkm_exec_tbb_internal_testing.exe
+2.0M    UnitTests_vtkm_exec_testing.exe
+83M     UnitTests_vtkm_filter_testing.exe
+476K    UnitTests_vtkm_internal_testing.exe
+148K    UnitTests_vtkm_interop_internal_testing.exe
+1.3M    UnitTests_vtkm_interop_testing.exe
+2.9M    UnitTests_vtkm_io_reader_testing.exe
+548K    UnitTests_vtkm_io_writer_testing.exe
+792K    UnitTests_vtkm_rendering_testing.exe
+3.7M    UnitTests_vtkm_testing.exe
+320K    UnitTests_vtkm_worklet_internal_testing.exe
+65M     UnitTests_vtkm_worklet_testing.exe
+11M     vtkm_cont-1.3.dll
+2.1M    vtkm_interop-1.3.dll
+21M     vtkm_rendering-1.3.dll
+3.9M    vtkm_worklet-1.3.dll
+```
+
+After making the singular change to the `Invocation` object to add the
+`DeviceAdapterTag` as a template parameter (which should cause any extra
+compile instances) the compile took **4 minuts and 5 seconds**. Here is the
+new list of binaries.
+
+```
+kmorel2 0> du -sh *.exe *.dll
+200K    BenchmarkArrayTransfer_SERIAL.exe
+204K    BenchmarkArrayTransfer_TBB.exe
+424K    BenchmarkAtomicArray_SERIAL.exe
+424K    BenchmarkAtomicArray_TBB.exe
+440K    BenchmarkCopySpeeds_SERIAL.exe
+580K    BenchmarkCopySpeeds_TBB.exe
+4.1M    BenchmarkDeviceAdapter_SERIAL.exe
+5.3M    BenchmarkDeviceAdapter_TBB.exe
+7.9M    BenchmarkFieldAlgorithms_SERIAL.exe
+7.9M    BenchmarkFieldAlgorithms_TBB.exe
+22M     BenchmarkFilters_SERIAL.exe
+22M     BenchmarkFilters_TBB.exe
+276K    BenchmarkRayTracing_SERIAL.exe
+276K    BenchmarkRayTracing_TBB.exe
+4.4M    BenchmarkTopologyAlgorithms_SERIAL.exe
+4.4M    BenchmarkTopologyAlgorithms_TBB.exe
+712K    Rendering_SERIAL.exe
+712K    Rendering_TBB.exe
+708K    UnitTests_vtkm_cont_arg_testing.exe
+1.7M    UnitTests_vtkm_cont_internal_testing.exe
+13M     UnitTests_vtkm_cont_serial_testing.exe
+14M     UnitTests_vtkm_cont_tbb_testing.exe
+19M     UnitTests_vtkm_cont_testing.exe
+13M     UnitTests_vtkm_cont_testing_mpi.exe
+736K    UnitTests_vtkm_exec_arg_testing.exe
+136K    UnitTests_vtkm_exec_internal_testing.exe
+196K    UnitTests_vtkm_exec_serial_internal_testing.exe
+196K    UnitTests_vtkm_exec_tbb_internal_testing.exe
+2.0M    UnitTests_vtkm_exec_testing.exe
+86M     UnitTests_vtkm_filter_testing.exe
+476K    UnitTests_vtkm_internal_testing.exe
+148K    UnitTests_vtkm_interop_internal_testing.exe
+1.3M    UnitTests_vtkm_interop_testing.exe
+2.9M    UnitTests_vtkm_io_reader_testing.exe
+548K    UnitTests_vtkm_io_writer_testing.exe
+792K    UnitTests_vtkm_rendering_testing.exe
+3.7M    UnitTests_vtkm_testing.exe
+320K    UnitTests_vtkm_worklet_internal_testing.exe
+68M     UnitTests_vtkm_worklet_testing.exe
+11M     vtkm_cont-1.3.dll
+2.1M    vtkm_interop-1.3.dll
+21M     vtkm_rendering-1.3.dll
+3.9M    vtkm_worklet-1.3.dll
+```
+
+So far the increase is quite negligible.
+
+## Worklets now support an execution mask
+
+There have recently been use cases where it would be helpful to mask out
+some of the invocations of a worklet. The idea is that when invoking a
+worklet with a mask array on the input domain, you might implement your
+worklet more-or-less like the following.
+
+```cpp
+VTKM_EXEC void operator()(bool mask, /* other parameters */)
+{
+  if (mask)
+  {
+    // Do interesting stuff
+  }
+}
+```
+
+This works, but what if your mask has mostly false values? In that case,
+you are spending tons of time loading data to and from memory where fields
+are stored for no reason.
+
+You could potentially get around this problem by adding a scatter to the
+worklet. However, that will compress the output arrays to only values that
+are active in the mask. That is problematic if you want the masked output
+in the appropriate place in the original arrays. You will have to do some
+complex (and annoying and possibly expensive) permutations of the output
+arrays.
+
+Thus, we would like a new feature similar to scatter that instead masks out
+invocations so that the worklet is simply not run on those outputs.
+
+### New Interface
+
+The new "Mask" feature that is similar (and orthogonal) to the existing
+"Scatter" feature. Worklet objects now define a `MaskType` that provides on
+object that manages the selections of which invocations are skipped. The
+following Mask objects are defined.
+
+  * `MaskNone` - This removes any mask of the output. All outputs are
+    generated. This is the default if no `MaskType` is explicitly defined.
+  * `MaskSelect` - The user to provides an array that specifies whether
+    each output is created with a 1 to mean that the output should be
+    created an 0 the mean that it should not.
+  * `MaskIndices` - The user provides an array with a list of indices for
+    all outputs that should be created.
+
+It will be straightforward to implement other versions of masks. (For
+example, you could make a mask class that selectes every Nth entry.) Those
+could be made on an as-needed basis.
+
+### Implementation
+
+The implementation follows the same basic idea of how scatters are
+implemented.
+
+#### Mask Classes
+
+The mask class is required to implement the following items.
+
+  * `ThreadToOutputType` - A type for an array that maps a thread index (an
+    index in the array) to an output index. A reasonable type for this
+    could be `vtkm::cont::ArrayHandle<vtkm::Id>`.
+  * `GetThreadToOutputMap` - Given the range for the output (e.g. the
+    number of items in the output domain), returns an array of type
+    `ThreadToOutputType` that is the actual map.
+  * `GetThreadRange` - Given a range for the output (e.g. the number of
+    items in the output domain), returns the range for the threads (e.g.
+    the number of times the worklet will be invoked).
+
+#### Dispatching
+
+The `vtkm::worklet::internal::DispatcherBase` manages a mask class in
+the same way it manages the scatter class. It gets the `MaskType` from
+the worklet it is templated on. It requires a `MaskType` object during
+its construction.
+
+Previously the dispatcher (and downstream) had to manage the range and
+indices of inputs and threads. They now have to also manage a separate
+output range/index as now all three may be different.
+
+The `vtkm::Invocation` is changed to hold the ThreadToOutputMap array from
+the mask. It likewises has a templated `ChangeThreadToOutputMap` method
+added (similar to those already existing for the arrays from a scatter).
+This method is used in `DispatcherBase::InvokeTransportParameters` to add
+the mask's array to the invocation before calling `InvokeSchedule`.
+
+#### Thread Indices
+
+With the addition of masks, the `ThreadIndices` classes are changed to
+manage the actual output index. Previously, the output index was always the
+same as the thread index. However, now these two can be different. The
+`GetThreadIndices` methods of the worklet base classes have an argument
+added that is the portal to the ThreadToOutputMap.
+
+The worklet `GetThreadIndices` is called from the `Task` classes. These
+classes are changed to pass in this additional argument. Since the `Task`
+classes get an `Invocation` object from the dispatcher, which contains the
+`ThreadToOutputMap`, this change is trivial.
+
+### Interaction Between Mask and Scatter
+
+Although it seems weird, it should work fine to mix scatters and masks. The
+scatter will first be applied to the input to generate a (potential) list
+of output elements. The mask will then be applied to these output elements.
+
+
+## Redesign VTK-m Runtime Device Tracking
+
+The device tracking infrastructure in VTK-m has been redesigned to
+remove multiple redundant codes paths and to simplify reasoning
+about around what an instance of RuntimeDeviceTracker will modify.
+
+
+`vtkm::cont::RuntimeDeviceTracker` tracks runtime information on
+a per-user thread basis. This is done to allow multiple calling
+threads to use different vtk-m backends such as seen in this
+example:
+
+```cpp
+  vtkm::cont::DeviceAdapterTagCuda cuda;
+  vtkm::cont::DeviceAdapterTagOpenMP openmp;
+  { // thread 1
+    auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
+    tracker->ForceDevice(cuda);
+    vtkm::worklet::Invoker invoke;
+    invoke(LightTask{}, input, output);
+    vtkm::cont::Algorithm::Sort(output);
+    invoke(HeavyTask{}, output);
+  }
+
+ { // thread 2
+    auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
+    tracker->ForceDevice(openmp);
+    vtkm::worklet::Invoker invoke;
+    invoke(LightTask{}, input, output);
+    vtkm::cont::Algorithm::Sort(output);
+    invoke(HeavyTask{}, output);
+  }
+```
+
+Note: `GetGlobalRuntimeDeviceTracker` has ben refactored to be `GetRuntimeDeviceTracker`
+as it always returned a unique instance for each control side thread. This design allows
+for different threads to have different runtime device settings. By removing the term `Global`
+from the name it becomes more clear what scope this class has.
+
+While this address the ability for threads to specify what
+device they should run on. It doesn't make it easy to toggle
+the status of a device in a programmatic way, for example
+the following block forces execution to only occur on
+`cuda` and doesn't restore previous active devices after
+
+```cpp
+  {
+  vtkm::cont::DeviceAdapterTagCuda cuda;
+  auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
+  tracker->ForceDevice(cuda);
+  vtkm::worklet::Invoker invoke;
+  invoke(LightTask{}, input, output);
+  }
+  //openmp/tbb/... still inactive
+```
+
+To resolve those issues we have `vtkm::cont::ScopedRuntimeDeviceTracker` which
+has the same interface as `vtkm::cont::RuntimeDeviceTracker` but additionally
+resets any per-user thread modifications when it goes out of scope. So by
+switching over the previous example to use `ScopedRuntimeDeviceTracker` we
+correctly restore the threads `RuntimeDeviceTracker` state when `tracker`
+goes out of scope.
+```cpp
+  {
+  vtkm::cont::DeviceAdapterTagCuda cuda;
+  vtkm::cont::ScopedRuntimeDeviceTracker tracker(cuda);
+  vtkm::worklet::Invoker invoke;
+  invoke(LightTask{}, input, output);
+  }
+  //openmp/tbb/... are now again active
+```
+
+The  `vtkm::cont::ScopedRuntimeDeviceTracker` is not limited to forcing
+execution to occur on a single device. When constructed it can either force
+execution to a device, disable a device or enable a device. These options
+also work with the `DeviceAdapterTagAny`.
+
+
+```cpp
+  {
+  //enable all devices
+  vtkm::cont::DeviceAdapterTagAny any;
+  vtkm::cont::ScopedRuntimeDeviceTracker tracker(any,
+                                                 vtkm::cont::RuntimeDeviceTrackerMode::Enable);
+  ...
+  }
+
+  {
+  //disable only cuda
+  vtkm::cont::DeviceAdapterTagCuda cuda;
+  vtkm::cont::ScopedRuntimeDeviceTracker tracker(cuda,
+                                                 vtkm::cont::RuntimeDeviceTrackerMode::Disable);
+
+  ...
+  }
+```
+
+
+## `vtkm::cont::Initialize` added to make setting up VTK-m runtime state easier
+
+A new initialization function, `vtkm::cont::Initialize`, has been added.
+Initialization is not required, but will configure the logging utilities (when
+enabled) and allows forcing a device via a `-d` or `--device` command line
+option.
+
+
+Usage:
+
+```cpp
+#include <vtkm/cont/Initialize.h>
+
+int main(int argc, char *argv[])
+{
+  auto config = vtkm::cont::Initialize(argc, argv);
+
+  ...
+}
+```
+
+
+# ArrayHandle
+
+## Add `vtkm::cont::ArrayHandleVirtual`
+
+Added a new class named `vtkm::cont::ArrayHandleVirtual` that allows you to type erase an
+ArrayHandle storage type by using virtual calls. This simplification makes
+storing `Fields` and `Coordinates` significantly easier as VTK-m doesn't
+need to deduce both the storage and value type when executing worklets.
+
+To construct an `vtkm::cont::ArrayHandleVirtual` one can do the following:
+
+```cpp
+vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
+vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
+
+
+// constrcut from an array handle
+vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v(pressure);
+
+// or assign from an array handle
+v = constant;
+
+```
+
+To help maintain performance `vtkm::cont::ArrayHandleVirtual` provides a collection of helper
+functions/methods to query and cast back to the concrete storage and value type:
+```cpp
+vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
+vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v = constant;
+
+const bool isConstant = vtkm::cont::IsType< decltype(constant) >(v);
+if(isConstant)
+  vtkm::cont::ArrayHandleConstant<vtkm::Float32> t = vtkm::cont::Cast< decltype(constant) >(v);
+
+```
+
+Lastly, a common operation of calling code using `ArrayHandleVirtual` is a desire to construct a new instance
+of an existing virtual handle with the same storage type. This can be done by using the `NewInstance` method
+as seen below
+```cpp
+vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
+vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v = pressure;
+
+vtkm::cont::ArrayHandleVirtual<vtkm::Float32> newArray = v->NewInstance();
+bool isConstant = vtkm::cont::IsType< vtkm::cont::ArrayHandle<vtkm::Float32> >(newArray); //will be true
+```
+
+
+## `vtkm::cont::ArrayHandleZip` provides a consistent API even with non-writable handles
+
+Previously `vtkm::cont::ArrayHandleZip` could not wrap an implicit handle and provide a consistent experience.
+The primary issue was that if you tried to use the PortalType returned by `GetPortalControl()` you
+would get a compile failure. This would occur as the PortalType returned would try to call `Set`
+on an ImplicitPortal which doesn't have a set method.
+
+Now with this change, the `ZipPortal` use SFINAE to determine if `Set` and `Get` should call the
+underlying zipped portals.
+
+
+## `vtkm::cont::VariantArrayHandle` replaces `vtkm::cont::DynamicArrayHandle`
+
+`vtkm::cont::ArrayHandleVariant` replaces `vtkm::cont::DynamicArrayHandle` as the
+primary method for holding onto a type erased `vtkm::cont::ArrayHandle`. The major
+difference between the two implementations is how they handle the Storage component of
+an array handle.
+
+`vtkm::contDynamicArrayHandle` approach was to find the fully deduced type of the `ArrayHandle`
+meaning it would check all value and storage types it knew about until it found a match.
+This cross product of values and storages would cause significant compilation times when
+a `DynamicArrayHandle` had multiple storage types.
+
+`vtkm::cont::VariantArrayHandle` approach is to only deduce the value type of the `ArrayHandle`
+and return a `vtkm::cont::ArrayHandleVirtual` which uses polymorpishm to hide the actual
+storage type. This approach allows for better compile times, and for calling code
+to always expect an `ArrayHandleVirtual` instead of the fully deduced type. This conversion
+to `ArrayHandleVirtual` is usually done internally within VTK-m when a  worklet or filter
+is invoked.
+
+In certain cases users of `VariantArrayHandle` want to be able to access the concrete
+`ArrayHandle<T,S>` and not have it wrapped in a `ArrayHandleVirtual`. For those occurrences
+`VariantArrayHandle` provides a collection of helper functions/methods to query and
+cast back to the concrete storage and value type:
+
+```cpp
+vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
+vtkm::cont::ArrayHandleVariant v(constant);
+
+const bool isConstant = vtkm::cont::IsType< decltype(constant) >(v);
+if(isConstant)
+  vtkm::cont::ArrayHandleConstant<vtkm::Float32> t = vtkm::cont::Cast< decltype(constant) >(v);
+
+```
+
+Lastly, a common operation of calling code using `VariantArrayHandle` is a desire to construct a new instance
+of an existing virtual handle with the same storage type. This can be done by using the `NewInstance` method
+as seen below:
+
+```cpp
+vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
+vtkm::cont::ArrayHandleVariant v(pressure);
+
+vtkm::cont::ArrayHandleVariant newArray = v->NewInstance();
+const bool isConstant = vtkm::cont::IsType< decltype(pressure) >(newArray); //will be true
+```
+
+
+## `vtkm::cont::VariantArrayHandle` CastAndCall supports casting to concrete types
+
+Previously, the `VariantArrayHandle::CastAndCall` (and indirect calls through
+`vtkm::cont::CastAndCall`) attempted to cast to only
+`vtkm::cont::ArrayHandleVirtual` with different value types. That worked, but
+it meant that whatever was called had to operate through virtual functions.
+
+Under most circumstances, it is worthwhile to also check for some common
+storage types that, when encountered, can be accessed much faster. This
+change provides the casting to concrete storage types and now uses
+`vtkm::cont::ArrayHandleVirtual` as a fallback when no concrete storage
+type is found.
+
+By default, `CastAndCall` checks all the storage types in
+`VTKM_DEFAULT_STORAGE_LIST_TAG`, which typically contains only the basic
+storage. The `ArrayHandleVirtual::CastAndCall` method also allows you to
+override this behavior by specifying a different type list in the first
+argument. If the first argument is a list type, `CastAndCall` assumes that
+all the types in the list are storage tags. If you pass in
+`vtkm::ListTagEmpty`, then `CastAndCall` will always cast to an
+`ArrayHandleVirtual` (the previous behavior). Alternately, you can pass in
+storage tags that might be likely under the current usage.
+
+As an example, consider the following simple code.
+
+``` cpp
+vtkm::cont::VariantArrayHandle array;
+
+// stuff happens
+
+array.CastAndCall(myFunctor);
+```
+
+Previously, `myFunctor` would be called with
+`vtkm::cont::ArrayHandleVirtual<T>` with different type `T`s. After this
+change, `myFunctor` will be called with that and with
+`vtkm::cont::ArrayHandle<T>` of the same type `T`s.
+
+If you want to only call `myFunctor` with
+`vtkm::cont::ArrayHandleVirtual<T>`, then replace the previous line with
+
+``` cpp
+array.CastAndCall(vtkm::ListTagEmpty(), myFunctor);
+```
+
+Let's say that additionally using `vtkm::cont::ArrayHandleIndex` was also
+common. If you want to also specialize for that array, you can do so with
+the following line.
+
+``` cpp
+array.CastAndCall(vtkm::ListTagBase<vtkm::cont::StorageBasic,
+                                    vtkm::cont::ArrayHandleIndex::StorageTag>,
+                  myFunctor);
+```
+
+Note that `myFunctor` will be called with
+`vtkm::cont::ArrayHandle<T,vtkm::cont::ArrayHandleIndex::StorageTag>`, not
+`vtkm::cont::ArrayHandleIndex`.
+
+
+## `vtkm::cont::VariantArrayHandle::AsVirtual<T>()` performs casting
+
+The `AsVirtual<T>` method of `vtkm::cont::VariantArrayHandle` now works for
+any arithmetic type, not just the actual type of the underlying array. This
+works by inserting an `ArrayHandleCast` between the underlying concrete array
+and the new `ArrayHandleVirtual` when needed.
+
+
+## `StorageBasic::StealArray()` now provides delete function to new owner
+
+Memory that is stolen from VTK-m has to be freed correctly. This is required
+as the memory could have been allocated with `new`, `malloc` or even `cudaMallocManaged`.
+
+Previously it was very easy to transfer ownership of memory out of VTK-m and
+either fail to capture the free function, or ask for it after the transfer
+operation which would return a nullptr. Now stealing an array also
+provides the free function reducing one source of memory leaks.
+
+To properly steal memory from VTK-m you do the following:
+```cpp
+  vtkm::cont::ArrayHandle<T> arrayHandle;
+
+  ...
+
+  auto* stolen = arrayHandle.StealArray();
+  T* ptr = stolen.first;
+  auto free_function = stolen.second;
+
+  ...
+
+  free_function(ptr);
+```
+
+
+# Control Environment
+
+## `vtkm::cont::CellLocatorGeneral` has been added
+
+`vtkm::cont::CellLocatorUniformBins` can work with all kinds of datasets, but there are cell
+locators that are more efficient for specific data sets. Therefore, a new cell
+locator - `vtkm::cont::CellLocatorGeneral` has been implemented that can be configured to use
+specialized cell locators based on its input data. A "configurator" function object
+can be specified using the `SetConfigurator()` function. The configurator should
+have the following signature:
+
+```cpp
+void (std::unique_ptr<vtkm::cont::CellLocator>&,
+     const vtkm::cont::DynamicCellSet&,
+     const vtkm::cont::CoordinateSystem&);
+```
+
+The configurator is invoked whenever the `Update` method is called and the input
+has changed. The current cell locator is passed in a `std::unique_ptr`. Based on
+the types of the input cellset and coordinates, and possibly some heuristics on
+their values, the current cell locator's parameters can be updated, or a different
+cell-locator can be instantiated and transferred to the `unique_ptr`. The default
+configurator configures a `vtkm::cont::CellLocatorUniformGrid` for uniform grid datasets,
+a `vtkm::cont::CellLocatorRecitlinearGrid` for rectilinear datasets, and
+`vtkm::cont::CellLocatorUniformBins` for all other dataset types.
+
+The class `CellLocatorHelper` that implemented similar functionality to `CellLocatorGeneral`
+ has been removed.
+
+## `vtkm::cont::CellLocatorTwoLevelUniformGrid` has been renamed to `vtkm::cont::CellLocatorUniformBins`
+
+`CellLocatorTwoLevelUniformGrid` has been renamed to `CellLocatorUniformBins`
+for brevity. It has been modified to be a subclass of `vtkm::cont::CellLocator`
+and can be used wherever a `CellLocator` is accepted.
+
+## `vtkm::cont::Timer` now supports  asynchronous and device independent timers
+
+`vtkm::cont::Timer` can now track execution time on a single device or across all
+enabled devices as seen below:
+
+```cpp
+vtkm::cont::Timer tbb_timer{vtkm::cont::DeviceAdaptertagTBB()};
+vtkm::cont::Timer all_timer;
+
+all_timer.Start();
+tbb_timer.Start();
+// Run blocking algorithm on tbb
+tbb_timer.Stop();
+// Run async-algorithms cuda
+all_timer.Stop();
+
+// Do more work
+
+//Now get time for all tbb work, and tbb_cuda work
+auto tbb_time = tbb_timer.GetElapsedTime();
+auto all_time = tbb_timer.GetElapsedTime();
+```
+
+When `Timer` is constructed without an explicit `vtkm::cont::DeviceAdapterId` it
+will track all device adapters and return the maximum elapsed time over all devices
+when `GetElapsedTime` is called.
+
+
+## `vtkm::cont::DeviceAdapterId` construction from strings are now case-insensitive
+
+You can now construct a `vtkm::cont::DeviceAdapterId` from a string no matter
+the case of it. The following all will construct the same `vtkm::cont::DeviceAdapterId`.
+
+```cpp
+vtkm::cont::DeviceAdapterId id1 = vtkm::cont::make_DeviceAdapterId("cuda");
+vtkm::cont::DeviceAdapterId id2 = vtkm::cont::make_DeviceAdapterId("CUDA");
+vtkm::cont::DeviceAdapterId id3 = vtkm::cont::make_DeviceAdapterId("Cuda");
+
+auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
+vtkm::cont::DeviceAdapterId id4 = tracker.GetDeviceAdapterId("cuda");
+vtkm::cont::DeviceAdapterId id5 = tracker.GetDeviceAdapterId("CUDA");
+vtkm::cont::DeviceAdapterId id6 = tracker.GetDeviceAdapterId("Cuda");
+```
+
+## `vtkm::cont::Initialize` will only parse known arguments
+
+When a library requires reading some command line arguments through a
+function like Initialize, it is typical that it will parse through
+arguments it supports and then remove those arguments from `argc` and
+`argv` so that the remaining arguments can be parsed by the calling
+program. Recent changes to the `vtkm::cont::Initialize` function support
+that.
+
+### Use Case
+
+Say you are creating a simple benchmark where you want to provide a command
+line option `--size` that allows you to adjust the size of the data that
+you are working on. However, you also want to support flags like `--device`
+and `-v` that are performed by `vtkm::cont::Initialize`. Rather than have
+to re-implement all of `Initialize`'s parsing, you can now first call
+`Initialize` to handle its arguments and then parse the remaining objects.
+
+The following is a simple (and rather incomplete) example:
+
+```cpp
+int main(int argc, char** argv)
+{
+  vtkm::cont::InitializeResult initResult = vtkm::cont::Initialize(argc, argv);
+
+  if ((argc > 1) && (strcmp(argv[1], "--size") == 0))
+  {
+    if (argc < 3)
+	{
+	  std::cerr << "--size option requires a numeric argument" << std::endl;
+	  std::cerr << "USAGE: " << argv[0] << " [options]" << std::endl;
+	  std::cerr << "Options are:" << std::endl;
+	  std::cerr << "  --size <number>\tSpecify the size of the data." << std::endl;
+	  std::cerr << initResult.Usage << std::endl;
+	  exit(1);
+	}
+
+	g_size = atoi(argv[2]);
+  }
+
+  std::cout << "Using device: " << initResult.Device.GetName() << std::endl;
+```
+
+### Additional Initialize Options
+
+Because `vtkm::cont::Initialize` no longer has the assumption that it is responsible
+for parsing _all_ arguments, some options have been added to
+`vtkm::cont::InitializeOptions` to manage these different use cases. The
+following options are now supported.
+
+  * `None` A placeholder for having all options off, which is the default.
+    (Same as before this change.)
+  * `RequireDevice` Issue an error if the device argument is not specified.
+    (Same as before this change.)
+  * `DefaultAnyDevice` If no device is specified, treat it as if the user
+    gave --device=Any. This means that DeviceAdapterTagUndefined will never
+    be return in the result.
+  * `AddHelp` Add a help argument. If `-h` or `--help` is provided, prints
+    a usage statement. Of course, the usage statement will only print out
+    arguments processed by VTK-m.
+  * `ErrorOnBadOption` If an unknown option is encountered, the program
+    terminates with an error and a usage statement is printed. If this
+    option is not provided, any unknown options are returned in `argv`. If
+    this option is used, it is a good idea to use `AddHelp` as well.
+  * `ErrorOnBadArgument` If an extra argument is encountered, the program
+    terminates with an error and a usage statement is printed. If this
+    option is not provided, any unknown arguments are returned in `argv`.
+  * `Strict` If supplied, Initialize treats its own arguments as the only
+    ones supported by the application and provides an error if not followed
+    exactly. This is a convenience option that is a combination of
+    `ErrorOnBadOption`, `ErrorOnBadArgument`, and `AddHelp`.
+
+### InitializeResult Changes
+
+The changes in `Initialize` have also necessitated the changing of some of
+the fields in the `InitializeResult` structure. The following fields are
+now provided in the `InitializeResult` struct.
+
+  * `Device` Returns the device selected in the command line arguments as a
+    `DeviceAdapterId`. If no device was selected,
+    `DeviceAdapterTagUndefined` is returned. (Same as before this change.)
+  * `Usage` Returns a string containing the usage for the options
+    recognized by `Initialize`. This can be used to build larger usage
+    statements containing options for both `Initialize` and the calling
+    program. See the example above.
+
+Note that the `Arguments` field has been removed from `InitializeResult`.
+This is because the unparsed arguments are now returned in the modified
+`argc` and `argv`, which provides a more complete result than the
+`Arguments` field did.
+
+# Execution Environment
+
+## VTK-m logs details about each CUDA kernel launch
+
+The VTK-m logging infrastructure has been extended with a new log level
+`KernelLaunches` which exists between `MemTransfer` and `Cast`.
+
+This log level reports the number of blocks, threads per block, and the
+PTX version of each CUDA kernel launched.
+
+This logging level was primarily introduced to help developers that are
+tracking down issues that occur when VTK-m components have been built with
+different `sm_XX` flags and help people looking to do kernel performance
+tuning.
+
+
+## VTK-m CUDA allocations can have managed memory (cudaMallocManaged) enabled/disabled from C++
+
+Previously it was impossible for calling code to explicitly disable cuda managed memory. This can be desirable for projects that know they don't need managed memory and are super performance critical.
+
+```cpp
+const bool usingManagedMemory = vtkm::cont::cuda::internal::CudaAllocator::UsingManagedMemory();
+if(usingManagedMemory)
+  {  //disable managed memory
+  vtkm::cont::cuda::internal::CudaAllocator::ForceManagedMemoryOff();
+  }
+```
+
+
+## VTK-m CUDA kernel scheduling improved including better defaults, and user customization support
+
+VTK-m now offers a more GPU aware set of defaults for kernel scheduling.
+When VTK-m first launches a kernel we do system introspection and determine
+what GPU's are on the machine and than match this information to a preset
+table of values. The implementation is designed in a way that allows for
+VTK-m to offer both specific presets for a given GPU ( V100 ) or for
+an entire generation of cards ( Pascal ).
+
+Currently VTK-m offers preset tables for the following GPU's:
+- Tesla V100
+- Tesla P100
+
+If the hardware doesn't match a specific GPU card we than try to find the
+nearest know hardware generation and use those defaults. Currently we offer
+defaults for
+- Older than Pascal Hardware
+- Pascal Hardware
+- Volta+ Hardware
+
+Some users have workloads that don't align with the defaults provided by
+VTK-m. When that is the cause, it is possible to override the defaults
+by binding a custom function to `vtkm::cont::cuda::InitScheduleParameters`.
+As shown below:
+
+```cpp
+  ScheduleParameters CustomScheduleValues(char const* name,
+                                          int major,
+                                          int minor,
+                                          int multiProcessorCount,
+                                          int maxThreadsPerMultiProcessor,
+                                          int maxThreadsPerBlock)
+  {
+
+    ScheduleParameters params  {
+        64 * multiProcessorCount,  //1d blocks
+        64,                        //1d threads per block
+        64 * multiProcessorCount,  //2d blocks
+        { 8, 8, 1 },               //2d threads per block
+        64 * multiProcessorCount,  //3d blocks
+        { 4, 4, 4 } };             //3d threads per block
+    return params;
+  }
+  vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues);
+```
+
+
+## VTK-m Reduction algorithm now supports differing input and output types
+
+It is common to want to perform a reduction where the input and output types
+are of differing types. A basic example would be when the input is `vtkm::UInt8`
+but the output is `vtkm::UInt64`. This has been supported since v1.2, as the input
+type can be implicitly convertible to the output type.
+
+What we now support is when the input type is not implicitly convertible to the output type,
+such as when the output type is `vtkm::Pair< vtkm::UInt64, vtkm::UInt64>`. For this to work
+we require that the custom binary operator implements also an `operator()` which handles
+the unary transformation of input to output.
+
+An example of a custom reduction operator for differing input and output types is:
+
+```cxx
+
+  struct CustomMinAndMax
+  {
+    using OutputType = vtkm::Pair<vtkm::Float64, vtkm::Float64>;
+
+    VTKM_EXEC_CONT
+    OutputType operator()(vtkm::Float64 a) const
+    {
+    return OutputType(a, a);
+    }
+
+    VTKM_EXEC_CONT
+    OutputType operator()(vtkm::Float64 a, vtkm::Float64 b) const
+    {
+      return OutputType(vtkm::Min(a, b), vtkm::Max(a, b));
+    }
+
+    VTKM_EXEC_CONT
+    OutputType operator()(const OutputType& a, const OutputType& b) const
+    {
+      return OutputType(vtkm::Min(a.first, b.first), vtkm::Max(a.second, b.second));
+    }
+
+    VTKM_EXEC_CONT
+    OutputType operator()(vtkm::Float64 a, const OutputType& b) const
+    {
+      return OutputType(vtkm::Min(a, b.first), vtkm::Max(a, b.second));
+    }
+
+    VTKM_EXEC_CONT
+    OutputType operator()(const OutputType& a, vtkm::Float64 b) const
+    {
+      return OutputType(vtkm::Min(a.first, b), vtkm::Max(a.second, b));
+    }
+  };
+
+
+```
+
+## Added specialized operators for ArrayPortalValueReference
+
+The `ArrayPortalValueReference` is supposed to behave just like the value it
+encapsulates and does so by automatically converting to the base type when
+necessary. However, when it is possible to convert that to something else,
+it is possible to get errors about ambiguous overloads. To avoid these, add
+specialized versions of the operators to specify which ones should be used.
+
+Also consolidated the CUDA version of an `ArrayPortalValueReference` to the
+standard one. The two implementations were equivalent and we would like
+changes to apply to both.
+
+
+# Worklets and Filters
+
+## `vtkm::worklet::Invoker` now supports worklets which require a Scatter object
+
+This change allows the `Invoker` class to support launching worklets that require
+a custom scatter operation. This is done by providing the scatter as the second
+argument when launch a worklet with the `()` operator.
+
+The following example shows a scatter being provided with a worklet launch.
+
+```cpp
+struct CheckTopology : vtkm::worklet::WorkletMapPointToCell
+{
+  using ControlSignature = void(CellSetIn cellset, FieldOutCell);
+  using ExecutionSignature = _2(FromIndices);
+  using ScatterType = vtkm::worklet::ScatterPermutation<>;
+  ...
+};
+
+
+vtkm::worklet::Ivoker invoke;
+invoke( CheckTopology{}, vtkm::worklet::ScatterPermutation{}, cellset, result );
+```
+
+
+## `BitFields` are now a support field input/out type for VTK-m worklets
+
+`BitFields` are:
+  - Stored in memory using a contiguous buffer of bits.
+  - Accessible via portals, a la ArrayHandle.
+  - Portals operate on individual bits or words.
+  - Operations may be atomic for safe use from concurrent kernels.
+
+The new `BitFieldToUnorderedSet` device algorithm produces an
+ArrayHandle containing the indices of all set bits, in no particular
+order.
+
+The new AtomicInterface classes provide an abstraction into bitwise
+atomic operations across control and execution environments and are
+used to implement the BitPortals.
+
+BitFields may be used as boolean-typed ArrayHandles using the
+ArrayHandleBitField adapter. `vtkm::cont::ArrayHandleBitField` uses atomic operations to read
+and write bits in the BitField, and is safe to use in concurrent code.
+
+For example, a simple worklet that merges two arrays based on a boolean
+condition:
+
+```cpp
+class ConditionalMergeWorklet : public vtkm::worklet::WorkletMapField
+{
+public:
+using ControlSignature = void(FieldIn cond,
+                              FieldIn trueVals,
+                              FieldIn falseVals,
+                              FieldOut result);
+using ExecutionSignature = _4(_1, _2, _3);
+
+template <typename T>
+VTKM_EXEC T operator()(bool cond, const T& trueVal, const T& falseVal) const
+{
+  return cond ? trueVal : falseVal;
+}
+
+};
+
+BitField bits = ...;
+auto condArray = vtkm::cont::make_ArrayHandleBitField(bits);
+auto trueArray = vtkm::cont::make_ArrayHandleCounting<vtkm::Id>(20, 2, NUM_BITS);
+auto falseArray = vtkm::cont::make_ArrayHandleCounting<vtkm::Id>(13, 2, NUM_BITS);
+vtkm::cont::ArrayHandle<vtkm::Id> output;
+
+vtkm::worklet::Invoker invoke( vtkm::cont::DeviceAdaptertagTBB{} );
+invoke(ConditionalMergeWorklet{}, condArray, trueArray, falseArray, output);
+
+```
+
+
+## Added a Point Merging worklet
+
+We have added `vtkm::worklet::PointMerge` which uses a virtual grid approach to
+identify nearby points. The worklet works by creating a very fine but
+sparsely represented locator grid. It then groups points by grid bins and
+finds those within a specified radius.
+
+
+## `vtkm::filter::CleanGrid` now can do point merging
+
+The `CleanGrid` filter has been extended to use `vtkm::worklet::PointMerge` to
+allow for point merging. The following flags have been added to `CleanGrid` to
+modify the behavior of point merging.
+
+  - `Set`/`GetMergePoints` - a flag to turn on/off the merging of
+    duplicated coincident points. This extra operation will find points
+    spatially located near each other and merge them together.
+  - `Set`/`GetTolerance` - Defines the tolerance used when determining
+    whether two points are considered coincident. If the
+    `ToleranceIsAbsolute` flag is false (the default), then this tolerance
+    is scaled by the diagonal of the points. This parameter is only used
+    when merge points is on.
+  - `Set`/`GetToleranceIsAbsolute` - When ToleranceIsAbsolute is false (the
+     default) then the tolerance is scaled by the diagonal of the bounds of
+     the dataset. If true, then the tolerance is taken as the actual
+     distance to use. This parameter is only used when merge points is on.
+  - `Set`/`GetFastMerge` - When FastMerge is true (the default), some
+     corners are cut when computing coincident points. The point merge will
+     go faster but the tolerance will not be strictly followed.
+
+
+## Added a connected component worklets and filters
+
+We have added the `vtkm::filter::ImageConnectivity` and `vtkm::filter::CellSetConnectivity` filters
+to identify connected components in DataSets and the corresponding worklets. The `ImageConnectivity`
+identify connected components in `vtkm::cont::CellSetStructured`, based on same field value of neighboring
+cells. The `CellSetConnectivit`y identify connected components based on cell connectivity.
+
+Currently Moore neighborhood (i.e. 8 neighboring pixels for 2D and 27 neighboring pixels
+for 3D) is used for `ImageConnectivity`. For `CellSetConnectivity`, neighborhood is defined
+as cells sharing a common edge.
+
+
+# Build
+
+## CMake 3.8+ now required to build VTK-m
+
+Historically VTK-m has offered the ability to build a small
+subset of device adapters with CMake 3.3. As both our primary
+consumers have moved to CMake 3.8, and HPC machines continue
+to provide new CMake versions we have decided to simplify
+our CMake build system by requiring CMake 3.8 everywhere.
+
+
+## VTK-m now can verify that it installs itself correctly
+
+It was a fairly common occurrence of VTK-m to have a broken install
+tree as it had no easy way to verify that all headers would be installed.
+
+Now VTK-m offers a testing infrastructure that creates a temporary installed
+version and compile tests that build against the installed  VTK-m version. Currently
+we have tests that verify each header listed in VTK-m is installed, users can
+compile a custom `vtkm::filter` that uses diy, and users can call `vtkm::rendering`.
+
+## VTK-m now requires `CUDA` separable compilation to build
+
+With the introduction of `vtkm::cont::ArrayHandleVirtual` and the related infrastructure, vtk-m now
+requires that all CUDA code be compiled using separable compilation ( -rdc ).
+
+
+## VTK-m provides a `vtkm_filter` CMake target
+
+VTK-m now provides a `vtkm_filter` target that contains pre-built components
+of filters for consuming projects.
+
+
+## `vtkm::cont::CellLocatorBoundingIntervalHierarchy` is compiled into `vtkm_cont`
+
+All of the methods in CellLocatorBoundingIntervalHierarchy were listed in
+header files. This is sometimes problematic with virtual methods. Since
+everything implemented in it can just be embedded in a library, move the
+code into the vtkm_cont library.
+
+These changes caused some warnings in clang to show up based on virtual
+methods in other cell locators. Hence, the rest of the cell locators
+have also had some of their code moved to vtkm_cont.
+
+
+# Other
+
+## LodePNG added as a thirdparty package
+
+The lodepng library was brought is an thirdparty library.
+This has allowed the VTK-m rendering library to have a robust
+png decode functionality.
+
+
+## Optionparser added as a thirdparty package
+
+Previously we just took the optionparser.h file and stuck it right in
+our source code. That was problematic for a variety of reasons.
+
+  - It incorrectly assigned our license to external code.
+  - It made lots of unnecessary changes to the original source (like reformatting).
+  - It made it near impossible to track patches we make and updates to the original software.
+
+Now we use the third-party system to track changes to optionparser.h
+in the https://gitlab.kitware.com/third-party/optionparser repository.
+
+
+## Thirdparty diy now can coexist with external diy
+
+Previously VTK-m would leak macros that would cause an external diy
+to be incorrectly mangled breaking consumers of VTK-m that used diy.
+
+Going forward to use `diy` from VTK-m all calls must use the `vtkmdiy`
+namespace instead of the `diy` namespace. This allows for VTK-m to
+properly forward calls to either the external or internal version correctly.
+
+
+## Merge benchmark executables into a device dependent shared library
+
+VTK-m has been updated to replace old per device benchmark executables with a single
+multi-device executable. Selection of the device adapter is done at runtime through
+the `--device=` argument.
+
+
+## Merge rendering testing executables to a shared library
+
+VTK-m has been updated to replace old per device rendering testing executables with a single
+multi-device executable. Selection of the device adapter is done at runtime through
+the `--device=` argument.
+
+
+## Merge worklet testing executables into a device dependent shared library
+
+VTK-m has been updated to replace old per device working testing executables with a single
+multi-device executable. Selection of the device adapter is done at runtime through
+the `--device=` argument.
+
+## VTK-m runtime device detection properly handles busy CUDA devices
+
+When an application that uses VTK-m is first launched it will
+do a check to see if CUDA is supported at runtime. If for
+some reason that CUDA card is not allowing kernel execution
+VTK-m would report the hardware doesn't have CUDA support.
+
+This was problematic as was over aggressive in disabling CUDA
+support for hardware that could support kernel execution in
+the future. With the fact that every VTK-m worklet is executed
+through a TryExecute it is no longer necessary to be so
+aggressive in disabling CUDA support.
+
+Now the behavior is that VTK-m considers a machine to have
+CUDA runtime support if it has 1+ GPU's of Kepler or
+higher hardware (SM_30+).
diff --git a/docs/changelog/StorageBase-StealArray-returns-delete-function.md b/docs/changelog/StorageBase-StealArray-returns-delete-function.md
deleted file mode 100644
index fa08f5f99..000000000
--- a/docs/changelog/StorageBase-StealArray-returns-delete-function.md
+++ /dev/null
@@ -1,24 +0,0 @@
-## `StorageBasic` StealArray() now provides delete function to new owner
-
-Memory that is stolen from VTK-m has to be freed correctly. This is required
-as the memory could have been allocated with `new`, `malloc` or even `cudaMallocManaged`.
-
-Previously it was very easy to transfer ownership of memory out of VTK-m and
-either fail to capture the free function, or ask for it after the transfer
-operation which would return a nullptr. Now stealing an array also
-provides the free function reducing one source of memory leaks.
-
-To properly steal memory from VTK-m you do the following:
-```cpp
-  vtkm::cont::ArrayHandle<T> arrayHandle;
-  
-  ...
-  
-  auto* stolen = arrayHandle.StealArray();
-  T* ptr = stolen.first;
-  auto free_function = stolen.second;
-  
-  ...
-
-  free_function(ptr);
-```
diff --git a/docs/changelog/Variant_AsVirtual_force_cast.md b/docs/changelog/Variant_AsVirtual_force_cast.md
deleted file mode 100644
index 5b1bc1ba1..000000000
--- a/docs/changelog/Variant_AsVirtual_force_cast.md
+++ /dev/null
@@ -1,6 +0,0 @@
-# VariantArrayHandle::AsVirtual<T>() performs casting
-
-The AsVirtual<T> method of VariantArrayHandle now works for any arithmetic type,
-not just the actual type of the underlying array. This works by inserting an
-ArrayHandleCast between the underlying concrete array and the new
-ArrayHandleVirtual when needed.
diff --git a/docs/changelog/add-cuda-kernel-details-to-logging.md b/docs/changelog/add-cuda-kernel-details-to-logging.md
deleted file mode 100644
index 164362535..000000000
--- a/docs/changelog/add-cuda-kernel-details-to-logging.md
+++ /dev/null
@@ -1,12 +0,0 @@
-# VTK-m logs details about each CUDA kernel launch
-
-The VTK-m logging infrastructure has been extended with a new log level
-`KernelLaunches` which exists between `MemTransfer` and `Cast`.
-
-This log level reports the number of blocks, threads per block, and the
-PTX version of each CUDA kernel launched.
-
-This logging level was primarily introduced to help developers that are
-tracking down issues that occur when VTK-m components have been built with
-different `sm_XX` flags and help people looking to do kernel performance
-tuning.
diff --git a/docs/changelog/add-vtkm_filter-target.md b/docs/changelog/add-vtkm_filter-target.md
deleted file mode 100644
index 717c7baf3..000000000
--- a/docs/changelog/add-vtkm_filter-target.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# VTK-m provides a vtkm_filter target
-
-VTK-m now provides a `vtkm_filter` that contains pre-built components
-of filters for consuming projects.
diff --git a/docs/changelog/array-virtual-not-special.md b/docs/changelog/array-virtual-not-special.md
deleted file mode 100644
index 0002af433..000000000
--- a/docs/changelog/array-virtual-not-special.md
+++ /dev/null
@@ -1,11 +0,0 @@
-# Make ArrayHandleVirtual conform with other ArrayHandle structure
-
-Previously, ArrayHandleVirtual was defined as a specialization of
-ArrayHandle with the virtual storage tag. This was because the storage
-object was polymorphic and needed to be handled special. These changes
-moved the existing storage definition to an internal class, and then
-managed the pointer to that implementation class in a Storage object that
-can be managed like any other storage object.
-    
-Also moved the implementation of StorageAny into the implementation of the
-internal storage object.
diff --git a/docs/changelog/arrayhandlevirtual.md b/docs/changelog/arrayhandlevirtual.md
deleted file mode 100644
index e5172d648..000000000
--- a/docs/changelog/arrayhandlevirtual.md
+++ /dev/null
@@ -1,45 +0,0 @@
-# Add vtkm::cont::ArrayHandleVirtual 
-
-
-Added a new class named `ArrayHandleVirtual` that allows you to type erase an
-ArrayHandle storage type by using virtual calls. This simplification makes
-storing `Fields` and `Coordinates` significantly easier as VTK-m doesn't
-need to deduce both the storage and value type when executing worklets.
-
-To construct an `ArrayHandleVirtual` one can do one of the following:
-
-```cpp
-vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
-vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
-
-
-// constrcut from an array handle
-vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v(pressure);
-
-// or assign from an array handle
-v = constant;
-
-```
-
-To help maintain performance `ArrayHandleVirtual` provides a collection of helper
-functions/methods to query and cast back to the concrete storage and value type:
-```cpp
-vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
-vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v = constant;
-
-bool isConstant = vtkm::cont::IsType< decltype(constant) >(v);
-if(isConstant)
-  vtkm::cont::ArrayHandleConstant<vtkm::Float32> t = vtkm::cont::Cast< decltype(constant) >(v);
-
-```
-
-Lastly, a common operation of calling code using `ArrayHandleVirtual` is a desire to construct a new instance
-of an existing virtual handle with the same storage type. This can be done by using the `NewInstance` method
-as seen below
-```cpp
-vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
-vtkm::cont::ArrayHandleVirtual<vtkm::Float32> v = pressure;
-
-vtkm::cont::ArrayHandleVirtual<vtkm::Float32> newArray = v->NewInstance();
-bool isConstant = vtkm::cont::IsType< vtkm::cont::ArrayHandle<vtkm::Float32> >(newArray); //will be true
-```
diff --git a/docs/changelog/arrayhandlezip-handles-writing-to-implicit-handles.md b/docs/changelog/arrayhandlezip-handles-writing-to-implicit-handles.md
deleted file mode 100644
index 06c88c4b1..000000000
--- a/docs/changelog/arrayhandlezip-handles-writing-to-implicit-handles.md
+++ /dev/null
@@ -1,9 +0,0 @@
-# vtkm::cont::ArrayHandleZip provides a consistent API even with non-writable handles
-
-Previously ArrayHandleZip could not wrap an implicit handle and provide a consistent experience.
-The primary issue was that if you tried to use the PortalType returned by GetPortalControl() you
-would get a compile failure. This would occur as the PortalType returned would try to call `Set`
-on an ImplicitPortal which doesn't have a set method. 
-
-Now with this change, the `ZipPortal` use SFINAE to determine if `Set` and `Get` should call the
-underlying zipped portals.
diff --git a/docs/changelog/asynchronize-device-independent-timer.md b/docs/changelog/asynchronize-device-independent-timer.md
deleted file mode 100644
index 065540a40..000000000
--- a/docs/changelog/asynchronize-device-independent-timer.md
+++ /dev/null
@@ -1,65 +0,0 @@
-# Introduce asynchronous and device independent timer
-
-The timer class now is asynchronous and device independent. it's using an
-similiar API as vtkOpenGLRenderTimer with Start(), Stop(), Reset(), Ready(),
-and GetElapsedTime() function. For convenience and backward compability, Each
-Start() function call will call Reset() internally. GetElapsedTime() function
-can be used multiple times to time sequential operations and Stop() function
-can be helpful when you want to get the elapsed time latter.
-
-Bascially it can be used in two modes:
-
-* Create a Timer without any device info.
-  * It would enable the timer for all enabled devices on the machine. Users can get a
-specific elapsed time by passing a device id into the GetElapsedTime function.
-If no device is provided, it would pick the maximum of all timer results - the
-logic behind this decision is that if cuda is disabled, openmp, serial and tbb
-roughly give the same results; if cuda is enabled it's safe to return the
-maximum elapsed time since users are more interested in the device execution
-time rather than the kernal launch time. The Ready function can be handy here
-to query the status of the timer.
-
-``` Construct a generic timer
-// Assume CUDA is enabled on the machine
-vtkm::cont::Timer timer;
-timer.Start();
-// Run the algorithm
-
-auto timeHost = timer.GetElapsedTime(vtkm::cont::DeviceAdapterTagSerial());
-// To avoid the expensive device synchronization, we query is ready here.
-if (timer.IsReady())
-{
-  auto timeDevice = timer.GetElapsedTime(vtkm::cont::DeviceAdapterTagCuda());
-}
-// Force the synchronization. Ideally device execution time would be returned
-which takes longer time than ther kernal call
-auto timeGeneral = timer.GetElapsedTime();
-```
-
-* Create a Timer with a specific device.
-  * It works as the old timer that times for a specific device id.
-``` Construct a device specific timer
-// Assume TBB is enabled on the machine
-vtkm::cont::Timer timer{vtkm::cont::DeviceAdaptertagTBB()};
-timer.Start(); // t0
-// Run the algorithm
-
-// Timer would just return 0 and warn the user in the logger that an invalid
-// device is used to query elapsed time
-auto timeInvalid = timer.GetElapsedTime(vtkm::cont::DeviceAdapterTagSerial());
-if timer.IsReady()
-{
-  // Either will work and mark t1, return t1-t0
-  auto time1TBB = timer.GetElapsedTime(vtkm::cont::DeviceAdapterTagTBB());
-  auto time1General = timer.GetElapsedTime();
-}
-
-// Do something
-auto time2 = timer.GetElapsedTime(); // t2 will be marked and t2-t0 will be returned
-
-// Do something
-timer.Stop() // t3 marked
-
-// Do something then summarize latter
-auto timeFinal = timer.GetElapsedTime(); // t3-t0
-```
diff --git a/docs/changelog/bitfields.md b/docs/changelog/bitfields.md
deleted file mode 100644
index 13fb2c30c..000000000
--- a/docs/changelog/bitfields.md
+++ /dev/null
@@ -1,51 +0,0 @@
-# Add support for BitFields.
-
-BitFields are:
-- Stored in memory using a contiguous buffer of bits.
-- Accessible via portals, a la ArrayHandle.
-- Portals operate on individual bits or words.
-- Operations may be atomic for safe use from concurrent kernels.
-
-The new BitFieldToUnorderedSet device algorithm produces an
-ArrayHandle containing the indices of all set bits, in no particular
-order.
-
-The new AtomicInterface classes provide an abstraction into bitwise
-atomic operations across control and execution environments and are
-used to implement the BitPortals.
-
-BitFields may be used as boolean-typed ArrayHandles using the
-ArrayHandleBitField adapter. ArrayHandleBitField uses atomic operations to read
-and write bits in the BitField, and is safe to use in concurrent code.
-
-For example, a simple worklet that merges two arrays based on a boolean
-condition is tested in TestingBitField:
-
-```
-class ConditionalMergeWorklet : public vtkm::worklet::WorkletMapField
-{
-public:
-using ControlSignature = void(FieldIn cond,
-                              FieldIn trueVals,
-                              FieldIn falseVals,
-                              FieldOut result);
-using ExecutionSignature = _4(_1, _2, _3);
-
-template <typename T>
-VTKM_EXEC T operator()(bool cond, const T& trueVal, const T& falseVal) const
-{
-  return cond ? trueVal : falseVal;
-}
-
-};
-
-BitField bits = ...;
-auto condArray = vtkm::cont::make_ArrayHandleBitField(bits);
-auto trueArray = vtkm::cont::make_ArrayHandleCounting<vtkm::Id>(20, 2, NUM_BITS);
-auto falseArray = vtkm::cont::make_ArrayHandleCounting<vtkm::Id>(13, 2, NUM_BITS);
-vtkm::cont::ArrayHandle<vtkm::Id> output;
-
-vtkm::worklet::DispatcherMapField<ConditionalMergeWorklet> dispatcher;
-dispatcher.Invoke(condArray, trueArray, falseArray, output);
-
-```
diff --git a/docs/changelog/bounding-interval-hierarchy-in-vtkm-cont.md b/docs/changelog/bounding-interval-hierarchy-in-vtkm-cont.md
deleted file mode 100644
index 610426ba5..000000000
--- a/docs/changelog/bounding-interval-hierarchy-in-vtkm-cont.md
+++ /dev/null
@@ -1,10 +0,0 @@
-# Put CellLocatorBoundingIntervalHierarchy in vtkm_cont library
-
-All of the methods in CellLocatorBoundingIntervalHierarchy were listed in
-header files. This is sometimes problematic with virtual methods. Since
-everything implemented in it can just be embedded in a library, move the
-code into the vtkm_cont library.
-
-These changes caused some warnings in clang to show up based on virtual
-methods in other cell locators. Hence, the rest of the cell locators
-have also had some of their code moved to vtkm_cont.
diff --git a/docs/changelog/case-insensitive-device-from-string.md b/docs/changelog/case-insensitive-device-from-string.md
deleted file mode 100644
index e7bd36f95..000000000
--- a/docs/changelog/case-insensitive-device-from-string.md
+++ /dev/null
@@ -1,14 +0,0 @@
-# VTK-m  `vtkm::cont::DeviceAdapterId` construction from string are now case-insensitive
-
-You can now construct a `vtkm::cont::DeviceAdapterId` from a string no matter
-the case of it. The following all will construct the same `vtkm::cont::DeviceAdapterId`.
-
-```cpp
-vtkm::cont::DeviceAdapterId id1 = vtkm::cont::make_DeviceAdapterId("cuda");
-vtkm::cont::DeviceAdapterId id2 = vtkm::cont::make_DeviceAdapterId("CUDA");
-vtkm::cont::DeviceAdapterId id3 = vtkm::cont::make_DeviceAdapterId("Cuda");
-
-auto& tracker = vtkm::cont::GetGlobalRuntimeDeviceTracker();
-vtkm::cont::DeviceAdapterId id4 = tracker.GetDeviceAdapterId("cuda");
-vtkm::cont::DeviceAdapterId id5 = tracker.GetDeviceAdapterId("CUDA");
-vtkm::cont::DeviceAdapterId id6 = tracker.GetDeviceAdapterId("Cuda");
diff --git a/docs/changelog/cast-variant-to-storage.md b/docs/changelog/cast-variant-to-storage.md
deleted file mode 100644
index fe9d37c3c..000000000
--- a/docs/changelog/cast-variant-to-storage.md
+++ /dev/null
@@ -1,58 +0,0 @@
-# Allow VariantArrayHandle CastAndCall to cast to concrete types
-
-Previously, the `VariantArrayHandle::CastAndCall` (and indirect calls through
-`vtkm::cont::CastAndCall`) attempted to cast to only
-`vtkm::cont::ArrayHandleVirtual` with different value types. That worked, but
-it meant that whatever was called had to operate through virtual functions.
-
-Under most circumstances, it is worthwhile to also check for some common
-storage types that, when encountered, can be accessed much faster. This
-change provides the casting to concrete storage types and now uses
-`vtkm::cont::ArrayHandleVirtual` as a fallback when no concrete storage
-type is found.
-
-By default, `CastAndCall` checks all the storage types in
-`VTKM_DEFAULT_STORAGE_LIST_TAG`, which typically contains only the basic
-storage. The `ArrayHandleVirtual::CastAndCall` method also allows you to
-override this behavior by specifying a different type list in the first
-argument. If the first argument is a list type, `CastAndCall` assumes that
-all the types in the list are storage tags. If you pass in
-`vtkm::ListTagEmpty`, then `CastAndCall` will always cast to an
-`ArrayHandleVirtual` (the previous behavior). Alternately, you can pass in
-storage tags that might be likely under the current usage.
-
-As an example, consider the following simple code.
-
-``` cpp
-vtkm::cont::VariantArrayHandle array;
-
-// stuff happens
-
-array.CastAndCall(myFunctor);
-```
-
-Previously, `myFunctor` would be called with
-`vtkm::cont::ArrayHandleVirtual<T>` with different type `T`s. After this
-change, `myFunctor` will be called with that and with
-`vtkm::cont::ArrayHandle<T>` of the same type `T`s.
-
-If you want to only call `myFunctor` with
-`vtkm::cont::ArrayHandleVirtual<T>`, then replace the previous line with
-
-``` cpp
-array.CastAndCall(vtkm::ListTagEmpty(), myFunctor);
-```
-
-Let's say that additionally using `vtkm::cont::ArrayHandleIndex` was also
-common. If you want to also specialize for that array, you can do so with
-the following line.
-
-``` cpp
-array.CastAndCall(vtkm::ListTagBase<vtkm::cont::StorageBasic, 
-                                    vtkm::cont::ArrayHandleIndex::StorageTag>,
-                  myFunctor);
-```
-
-Note that `myFunctor` will be called with
-`vtkm::cont::ArrayHandle<T,vtkm::cont::ArrayHandleIndex::StorageTag>`, not
-`vtkm::cont::ArrayHandleIndex`.
diff --git a/docs/changelog/cmake-38-required.md b/docs/changelog/cmake-38-required.md
deleted file mode 100644
index f58c9a5da..000000000
--- a/docs/changelog/cmake-38-required.md
+++ /dev/null
@@ -1,10 +0,0 @@
-# CMake 3.8 Required to build VTK-m
-
-While VTK-m has always required a fairly recent version
-of CMake when building for Visual Studio, or if OpenMP or 
-CUDA are enabled, it has supported building with the TBB
-device with CMake 3.3.
-
-Given the fact that our primary consumer (VTK) has moved
-to require CMake 3.8, it doesn't make sense to require
-CMake 3.3 and we have moved to a minimum of 3.8.
diff --git a/docs/changelog/connected-components.md b/docs/changelog/connected-components.md
deleted file mode 100644
index b9497738c..000000000
--- a/docs/changelog/connected-components.md
+++ /dev/null
@@ -1,9 +0,0 @@
-# Add connected component worklets and filters
-
-We have added the `ImageConnectivity` and `CellSetConnectivity` worklets and
-the corresponding filters to identify connected components in DataSet. The ImageConnectivity
-identify connected components in CellSetStructured, based on same field value of neighboring
-cells and the CellSetConnective identify connected components based on cell connectivity.
-Currently Moore neighborhood (i.e. 8 neighboring pixels for 2D and 27 neighboring pixels
-for 3D) is used for ImageConnectivity. For CellSetConnectivity, neighborhood is defined
-as cells sharing a common edge.
diff --git a/docs/changelog/cuda-allocator-disable-managed-memory-from-code.md b/docs/changelog/cuda-allocator-disable-managed-memory-from-code.md
deleted file mode 100644
index 3feb29c3a..000000000
--- a/docs/changelog/cuda-allocator-disable-managed-memory-from-code.md
+++ /dev/null
@@ -1,6 +0,0 @@
-# CudaAllocator Managed Memory can be disabled from C++
-
-Previously it was impossible for calling code to explicitly
-disable managed memory. This can be desirable for projects
-that know they don't need managed memory and are super
-performance critical.
diff --git a/docs/changelog/cuda-separable-compilation-enabled.md b/docs/changelog/cuda-separable-compilation-enabled.md
deleted file mode 100644
index 6b231e88f..000000000
--- a/docs/changelog/cuda-separable-compilation-enabled.md
+++ /dev/null
@@ -1,4 +0,0 @@
-# VTK-m now requires CUDA separable compilation to build
-
-With the introduction of `vtkm::cont::ArrayHandleVirtual` and the related infrastructure, vtk-m now
-requires that all CUDA code be compiled using separable compilation ( -rdc ).
diff --git a/docs/changelog/field-tags-no-template.md b/docs/changelog/field-tags-no-template.md
deleted file mode 100644
index f6c65b9d9..000000000
--- a/docs/changelog/field-tags-no-template.md
+++ /dev/null
@@ -1,132 +0,0 @@
-# Remove templates from ControlSignature field tags
-
-Previously, several of the `ControlSignature` tags had a template to
-specify a type list. This was to specify potential valid value types for an
-input array. The importance of this typelist was to limit the number of
-code paths created when resolving a `vtkm::cont::VariantArrayHandle`
-(formerly a `DynamicArrayHandle`). This (potentially) reduced the compile
-time, the size of libraries/executables, and errors from unexpected types.
-
-Much has changed since this feature was originally implemented. Since then,
-the filter infrastructure has been created, and it is through this that
-most dynamic worklet invocations happen. However, since the filter
-infrastrcture does its own type resolution (and has its own policies) the
-type arguments in `ControlSignature` are now of little value.
-
-## Script to update code
-
-This update requires changes to just about all code implementing a VTK-m
-worklet. To facilitate the update of this code to these new changes (not to
-mention all the code in VTK-m) a script is provided to automatically remove
-these template parameters from VTK-m code.
-
-This script is at
-[Utilities/Scripts/update-control-signature-tags.sh](../../Utilities/Scripts/update-control-signature-tags.sh).
-It needs to be run in a Unix-compatible shell. It takes a single argument,
-which is a top level directory to modify files. The script processes all C++
-source files recursively from that directory.
-
-## Selecting data types for auxiliary filter fields
-
-The main rational for making these changes is that the types of the inputs
-to worklets is almost always already determined by the calling filter.
-However, although it is straightforward to specify the type of the "main"
-(active) scalars in a filter, it is less clear what to do for additional
-fields if a filter needs a second or third field.
-
-Typically, in the case of a second or third field, it is up to the
-`DoExecute` method in the filter implementation to apply a policy to that
-field. When applying a policy, you give it a policy object (nominally
-passed by the user) and a traits of the filter. Generally, the accepted
-list of types for a field should be part of the filter's traits. For
-example, consider the `WarpVector` filter. This filter only works on
-`Vec`s of size 3, so its traits class looks like this.
-
-``` cpp
-template <>
-class FilterTraits<WarpVector>
-{
-public:
-  // WarpVector can only applies to Float and Double Vec3 arrays
-  using InputFieldTypeList = vtkm::TypeListTagFieldVec3;
-};
-```
-
-However, the `WarpVector` filter also requires two fields instead of one.
-The first (active) field is handled by its superclass (`FilterField`), but
-the second (auxiliary) field must be managed in the `DoExecute`. Generally,
-this can be done by simply applying the policy with the filter traits.
-
-## The corner cases
-
-Most of the calls to worklets happen within filter implementations, which
-have their own way of narrowing down potential types (as previously
-described). The majority of the remainder either use static types or work
-with a variety of types.
-
-However, there is a minority of corner cases that require a reduction of
-types. Since the type argument of the worklet `ControlSignature` arguments
-are no longer available, the narrowing of types must be done before the
-call to `Invoke`.
-
-This narrowing of arguments is not particularly difficult. Such type-unsure
-arguments usually come from a `VariantArrayHandle` (or something that uses
-one). You can select the types from a `VariantArrayHandle` simply by using
-the `ResetTypes` method. For example, say you know that a variant array is
-supposed to be a scalar.
-
-``` cpp
-dispatcher.Invoke(variantArray.ResetTypes(vtkm::TypeListTagFieldScalar()),
-                  staticArray);
-```
-
-Even more common is to have a `vtkm::cont::Field` object. A `Field` object
-internally holds a `VariantArrayHandle`, which is accessible via the
-`GetData` method.
-
-``` cpp
-dispatcher.Invoke(field.GetData().ResetTypes(vtkm::TypeListTagFieldScalar()),
-                  staticArray);
-```
-
-## Change in executable size
-
-The whole intention of these template parameters in the first place was to
-reduce the number of code paths compiled. The hypothesis of this change was
-that in the current structure the code paths were not being reduced much
-if at all. If that is true, the size of executables and libraries should
-not change.
-
-Here is a recording of the library and executable sizes before this change
-(using `ds -h`).
-
-```
-3.0M    libvtkm_cont-1.2.1.dylib
-6.2M    libvtkm_rendering-1.2.1.dylib
-312K    Rendering_SERIAL
-312K    Rendering_TBB
- 22M    Worklets_SERIAL
- 23M    Worklets_TBB
- 22M    UnitTests_vtkm_filter_testing
-5.7M    UnitTests_vtkm_cont_serial_testing
-6.0M    UnitTests_vtkm_cont_tbb_testing
-7.1M    UnitTests_vtkm_cont_testing
-```
-
-After the changes, the executable sizes are as follows.
-
-```
-3.0M    libvtkm_cont-1.2.1.dylib
-6.0M    libvtkm_rendering-1.2.1.dylib
-312K    Rendering_SERIAL
-312K    Rendering_TBB
- 21M    Worklets_SERIAL
- 21M    Worklets_TBB
- 22M    UnitTests_vtkm_filter_testing
-5.6M    UnitTests_vtkm_cont_serial_testing
-6.0M    UnitTests_vtkm_cont_tbb_testing
-7.1M    UnitTests_vtkm_cont_testing
-```
-
-As we can see, the built sizes have not changed significantly. (If
-anything, the build is a little smaller.)
diff --git a/docs/changelog/improve-cuda-scheduling.md b/docs/changelog/improve-cuda-scheduling.md
deleted file mode 100644
index c8a193e37..000000000
--- a/docs/changelog/improve-cuda-scheduling.md
+++ /dev/null
@@ -1,45 +0,0 @@
-# VTK-m CUDA kernel scheduling including improved defaults, and user customization
-
-VTK-m now offers a more GPU aware set of defaults for kernel scheduling.
-When VTK-m first launches a kernel we do system introspection and determine
-what GPU's are on the machine and than match this information to a preset
-table of values. The implementation is designed in a way that allows for
-VTK-m to offer both specific presets for a given GPU ( V100 ) or for
-an entire generation of cards ( Pascal ).
-
-Currently VTK-m offers preset tables for the following GPU's:
-- Tesla V100
-- Tesla P100
-
-If the hardware doesn't match a specific GPU card we than try to find the
-nearest know hardware generation and use those defaults. Currently we offer
-defaults for
-- Older than Pascal Hardware
-- Pascal Hardware
-- Volta+ Hardware
-
-Some users have workloads that don't align with the defaults provided by
-VTK-m. When that is the cause, it is possible to override the defaults
-by binding a custom function to `vtkm::cont::cuda::InitScheduleParameters`.
-As shown below:
-
-```cpp
-  ScheduleParameters CustomScheduleValues(char const* name,
-                                          int major,
-                                          int minor,
-                                          int multiProcessorCount,
-                                          int maxThreadsPerMultiProcessor,
-                                          int maxThreadsPerBlock)
-  {
-
-    ScheduleParameters params  {
-        64 * multiProcessorCount,  //1d blocks
-        64,                        //1d threads per block
-        64 * multiProcessorCount,  //2d blocks
-        { 8, 8, 1 },               //2d threads per block
-        64 * multiProcessorCount,  //3d blocks
-        { 4, 4, 4 } };             //3d threads per block
-    return params;
-  }
-  vtkm::cont::cuda::InitScheduleParameters(&CustomScheduleValues);
-```
diff --git a/docs/changelog/initialize.md b/docs/changelog/initialize.md
deleted file mode 100644
index f527e7e10..000000000
--- a/docs/changelog/initialize.md
+++ /dev/null
@@ -1,20 +0,0 @@
-# vtkm::cont::Initialize
-
-A new initialization function, vtkm::cont::Initialize, has been added.
-Initialization is not required, but will configure the logging utilities (when
-enabled) and allows forcing a device via a `-d` or `--device` command line
-option.
-
-
-Usage:
-
-```
-#include <vtkm/cont/Initialize.h>
-
-int main(int argc, char *argv[])
-{
-  auto config = vtkm::cont::Initialize(argc, argv);
-
-  ...
-}
-```
diff --git a/docs/changelog/invoker-supports-scatter-types.md b/docs/changelog/invoker-supports-scatter-types.md
deleted file mode 100644
index 78817d6e6..000000000
--- a/docs/changelog/invoker-supports-scatter-types.md
+++ /dev/null
@@ -1,21 +0,0 @@
-# `vtkm::worklet::Invoker` now able to worklets that have non-default scatter type
-
-This change allows the `Invoker` class to support launching worklets that require
-a custom scatter operation. This is done by providing the scatter as the second
-argument when launch a worklet with the `()` operator.
-
-The following example shows a scatter being provided with a worklet launch.
-
-```cpp
-struct CheckTopology : vtkm::worklet::WorkletMapPointToCell
-{
-  using ControlSignature = void(CellSetIn cellset, FieldOutCell);
-  using ExecutionSignature = _2(FromIndices);
-  using ScatterType = vtkm::worklet::ScatterPermutation<>;
-  ...
-};
-
-
-vtkm::worklet::Ivoker invoke;
-invoke( CheckTopology{}, vtkm::worklet::ScatterPermutation{}, cellset, result );
-```
diff --git a/docs/changelog/lodepng.md b/docs/changelog/lodepng.md
deleted file mode 100644
index f6207865f..000000000
--- a/docs/changelog/lodepng.md
+++ /dev/null
@@ -1,5 +0,0 @@
-# LodePNG added as a thirdparty
-
-The lodepng library was brought is an thirdparty library. 
-This has allowed the VTK-m rendering library to have a robust
-png decode functionality.
diff --git a/docs/changelog/mask-worklets.md b/docs/changelog/mask-worklets.md
deleted file mode 100644
index c4af31787..000000000
--- a/docs/changelog/mask-worklets.md
+++ /dev/null
@@ -1,104 +0,0 @@
-# Allow masking of worklet invocations
-
-There have recently been use cases where it would be helpful to mask out
-some of the invocations of a worklet. The idea is that when invoking a
-worklet with a mask array on the input domain, you might implement your
-worklet more-or-less like the following.
-
-```cpp
-VTKM_EXEC void operator()(bool mask, /* other parameters */)
-{
-  if (mask)
-  {
-    // Do interesting stuff
-  }
-}
-```
-
-This works, but what if your mask has mostly false values? In that case,
-you are spending tons of time loading data to and from memory where fields
-are stored for no reason.
-
-You could potentially get around this problem by adding a scatter to the
-worklet. However, that will compress the output arrays to only values that
-are active in the mask. That is problematic if you want the masked output
-in the appropriate place in the original arrays. You will have to do some
-complex (and annoying and possibly expensive) permutations of the output
-arrays.
-
-Thus, we would like a new feature similar to scatter that instead masks out
-invocations so that the worklet is simply not run on those outputs.
-
-## New Interface
-
-The new "Mask" feature that is similar (and orthogonal) to the existing
-"Scatter" feature. Worklet objects now define a `MaskType` that provides on
-object that manages the selections of which invocations are skipped. The
-following Mask objects are defined.
-
-  * `MaskNone` - This removes any mask of the output. All outputs are
-    generated. This is the default if no `MaskType` is explicitly defined.
-  * `MaskSelect` - The user to provides an array that specifies whether
-    each output is created with a 1 to mean that the output should be
-    created an 0 the mean that it should not.
-  * `MaskIndices` - The user provides an array with a list of indices for
-    all outputs that should be created.
-  
-It will be straightforward to implement other versions of masks. (For
-example, you could make a mask class that selectes every Nth entry.) Those
-could be made on an as-needed basis.
-
-## Implementation
-
-The implementation follows the same basic idea of how scatters are
-implemented.
-
-### Mask Classes
-
-The mask class is required to implement the following items.
-
-  * `ThreadToOutputType` - A type for an array that maps a thread index (an
-    index in the array) to an output index. A reasonable type for this
-    could be `vtkm::cont::ArrayHandle<vtkm::Id>`.
-  * `GetThreadToOutputMap` - Given the range for the output (e.g. the
-    number of items in the output domain), returns an array of type
-    `ThreadToOutputType` that is the actual map.
-  * `GetThreadRange` - Given a range for the output (e.g. the number of
-    items in the output domain), returns the range for the threads (e.g.
-    the number of times the worklet will be invoked).
-
-### Dispatching
-
-The `vtkm::worklet::internal::DispatcherBase` manages a mask class in
-the same way it manages the scatter class. It gets the `MaskType` from
-the worklet it is templated on. It requires a `MaskType` object during
-its construction.
-
-Previously the dispatcher (and downstream) had to manage the range and
-indices of inputs and threads. They now have to also manage a separate
-output range/index as now all three may be different.
-
-The `vtkm::Invocation` is changed to hold the ThreadToOutputMap array from
-the mask. It likewises has a templated `ChangeThreadToOutputMap` method
-added (similar to those already existing for the arrays from a scatter).
-This method is used in `DispatcherBase::InvokeTransportParameters` to add
-the mask's array to the invocation before calling `InvokeSchedule`.
-
-### Thread Indices
-
-With the addition of masks, the `ThreadIndices` classes are changed to
-manage the actual output index. Previously, the output index was always the
-same as the thread index. However, now these two can be different. The
-`GetThreadIndices` methods of the worklet base classes have an argument
-added that is the portal to the ThreadToOutputMap.
-
-The worklet `GetThreadIndices` is called from the `Task` classes. These
-classes are changed to pass in this additional argument. Since the `Task`
-classes get an `Invocation` object from the dispatcher, which contains the
-`ThreadToOutputMap`, this change is trivial.
-
-## Interaction Between Mask and Scatter
-
-Although it seems weird, it should work fine to mix scatters and masks. The
-scatter will first be applied to the input to generate a (potential) list
-of output elements. The mask will then be applied to these output elements.
diff --git a/docs/changelog/merge-benchmark-executables.md b/docs/changelog/merge-benchmark-executables.md
deleted file mode 100644
index 15695ab77..000000000
--- a/docs/changelog/merge-benchmark-executables.md
+++ /dev/null
@@ -1,6 +0,0 @@
-# Merge benchmark executables into a device dependent shared library
-
-VTK-m has been updated to replace old per device benchmark executables with a device
-dependent shared library so that it's able to accept a device adapter at runtime through
-the "--device=" argument.
-
diff --git a/docs/changelog/merge-rendering-testing-executables.md b/docs/changelog/merge-rendering-testing-executables.md
deleted file mode 100644
index d1eeb6673..000000000
--- a/docs/changelog/merge-rendering-testing-executables.md
+++ /dev/null
@@ -1,3 +0,0 @@
-Merge rendering testing executables to a shared library
-
-This commit allows rendering testing executables to select the device at runtime.
diff --git a/docs/changelog/merge-worklet-testing-executables.md b/docs/changelog/merge-worklet-testing-executables.md
deleted file mode 100644
index 084cd43dd..000000000
--- a/docs/changelog/merge-worklet-testing-executables.md
+++ /dev/null
@@ -1,8 +0,0 @@
-# Merge worklet testing executables into a device dependent shared library
-
-VTK-m has been updated to replace old per device worklet testing executables with a device
-dependent shared library so that it's able to accept a device adapter at runtime through
-the "--device=" argument.
-
-
-
diff --git a/docs/changelog/optionparser-to-third-party.md b/docs/changelog/optionparser-to-third-party.md
deleted file mode 100644
index 34190ef38..000000000
--- a/docs/changelog/optionparser-to-third-party.md
+++ /dev/null
@@ -1,13 +0,0 @@
-# Wrap third party optionparser.h in vtkm/cont/internal/OptionParser.h
-
-Previously we just took the optionparser.h file and stuck it right in
-our source code. That was problematic for a variety of reasons.
-
-1. It incorrectly assigned our license to external code.
-2. It made lots of unnecessary changes to the original source (like
-   reformatting).
-3. It made it near impossible to track patches we make and updates to
-   the original software.
-
-Instead, use the third-party system to track changes to optionparser.h
-in a different repository and then pull that into ours.
diff --git a/docs/changelog/parse-some-options-in-initialize.md b/docs/changelog/parse-some-options-in-initialize.md
deleted file mode 100644
index 12de9d6d4..000000000
--- a/docs/changelog/parse-some-options-in-initialize.md
+++ /dev/null
@@ -1,91 +0,0 @@
-# Allow Initialize to parse only some arguments
-
-When a library requires reading some command line arguments through a
-function like Initialize, it is typical that it will parse through
-arguments it supports and then remove those arguments from `argc` and
-`argv` so that the remaining arguments can be parsed by the calling
-program. Recent changes to the `vtkm::cont::Initialize` function support
-that.
-
-## Use Case
-
-Say you are creating a simple benchmark where you want to provide a command
-line option `--size` that allows you to adjust the size of the data that
-you are working on. However, you also want to support flags like `--device`
-and `-v` that are performed by `vtkm::cont::Initialize`. Rather than have
-to re-implement all of `Initialize`'s parsing, you can now first call
-`Initialize` to handle its arguments and then parse the remaining objects.
-
-The following is a simple (and rather incomplete) example:
-
-```cpp
-int main(int argc, char** argv)
-{
-  vtkm::cont::InitializeResult initResult = vtkm::cont::Initialize(argc, argv);
-  
-  if ((argc > 1) && (strcmp(argv[1], "--size") == 0))
-  {
-    if (argc < 3)
-	{
-	  std::cerr << "--size option requires a numeric argument" << std::endl;
-	  std::cerr << "USAGE: " << argv[0] << " [options]" << std::endl;
-	  std::cerr << "Options are:" << std::endl;
-	  std::cerr << "  --size <number>\tSpecify the size of the data." << std::endl;
-	  std::cerr << initResult.Usage << std::endl;
-	  exit(1);
-	}
-	
-	g_size = atoi(argv[2]);
-  }
-  
-  std::cout << "Using device: " << initResult.Device.GetName() << std::endl;
-```
-
-## Additional Initialize Options
-
-Because `Initialize` no longer has the assumption that it is responsible
-for parsing _all_ arguments, some options have been added to
-`vtkm::cont::InitializeOptions` to manage these different use cases. The
-following options are now supported.
-
-  * `None` A placeholder for having all options off, which is the default.
-    (Same as before this change.)
-  * `RequireDevice` Issue an error if the device argument is not specified.
-    (Same as before this change.)
-  * `DefaultAnyDevice` If no device is specified, treat it as if the user
-    gave --device=Any. This means that DeviceAdapterTagUndefined will never
-    be return in the result.
-  * `AddHelp` Add a help argument. If `-h` or `--help` is provided, prints
-    a usage statement. Of course, the usage statement will only print out
-    arguments processed by VTK-m.
-  * `ErrorOnBadOption` If an unknown option is encountered, the program
-    terminates with an error and a usage statement is printed. If this
-    option is not provided, any unknown options are returned in `argv`. If
-    this option is used, it is a good idea to use `AddHelp` as well.
-  * `ErrorOnBadArgument` If an extra argument is encountered, the program
-    terminates with an error and a usage statement is printed. If this
-    option is not provided, any unknown arguments are returned in `argv`.
-  * `Strict` If supplied, Initialize treats its own arguments as the only
-    ones supported by the application and provides an error if not followed
-    exactly. This is a convenience option that is a combination of
-    `ErrorOnBadOption`, `ErrorOnBadArgument`, and `AddHelp`.
-
-## InitializeResult Changes
-
-The changes in `Initialize` have also necessitated the changing of some of
-the fields in the `InitializeResult` structure. The following fields are
-now provided in the `InitializeResult` struct.
-
-  * `Device` Returns the device selected in the command line arguments as a
-    `DeviceAdapterId`. If no device was selected,
-    `DeviceAdapterTagUndefined` is returned. (Same as before this change.)
-  * `Usage` Returns a string containing the usage for the options
-    recognized by `Initialize`. This can be used to build larger usage
-    statements containing options for both `Initialize` and the calling
-    program. See the example above.
-
-Note that the `Arguments` field has been removed from `InitializeResult`.
-This is because the unparsed arguments are now returned in the modified
-`argc` and `argv`, which provides a more complete result than the
-`Arguments` field did.
-
diff --git a/docs/changelog/point-merge.md b/docs/changelog/point-merge.md
deleted file mode 100644
index 8179bd4f5..000000000
--- a/docs/changelog/point-merge.md
+++ /dev/null
@@ -1,26 +0,0 @@
-# Add point merge capabilities to CleanGrid filter
-
-We have added a `PointMerge` worklet that uses a virtual grid approach to
-identify nearby points. The worklet works by creating a very fine but
-sparsely represented locator grid. It then groups points by grid bins and
-finds those within a specified radius.
-
-This functionality has been integrated into the `CleanGrid` filter. The
-following flags have been added to `CleanGrid` to modify the behavior of
-point merging.
-
-  * `Set`/`GetMergePoints` - a flag to turn on/off the merging of
-    duplicated coincident points. This extra operation will find points
-    spatially located near each other and merge them together.
-  * `Set`/`GetTolerance` - Defines the tolerance used when determining
-    whether two points are considered coincident. If the
-    `ToleranceIsAbsolute` flag is false (the default), then this tolerance
-    is scaled by the diagonal of the points. This parameter is only used
-    when merge points is on.
-  * `Set`/`GetToleranceIsAbsolute` - When ToleranceIsAbsolute is false (the
-     default) then the tolerance is scaled by the diagonal of the bounds of
-     the dataset. If true, then the tolerance is taken as the actual
-     distance to use. This parameter is only used when merge points is on.
-  * `Set`/`GetFastMerge` - When FastMerge is true (the default), some
-     corners are cut when computing coincident points. The point merge will
-     go faster but the tolerance will not be strictly followed.
diff --git a/docs/changelog/portal-value-reference-operators.md b/docs/changelog/portal-value-reference-operators.md
deleted file mode 100644
index f9b3ca13b..000000000
--- a/docs/changelog/portal-value-reference-operators.md
+++ /dev/null
@@ -1,12 +0,0 @@
-# Added specialized operators for ArrayPortalValueReference
-
-The ArrayPortalValueReference is supposed to behave just like the value it
-encapsulates and does so by automatically converting to the base type when
-necessary. However, when it is possible to convert that to something else,
-it is possible to get errors about ambiguous overloads. To avoid these, add
-specialized versions of the operators to specify which ones should be used.
-
-Also consolidated the CUDA version of an ArrayPortalValueReference to the
-standard one. The two implementations were equivalent and we would like
-changes to apply to both.
-
diff --git a/docs/changelog/redesign-runtime-device-tracking.md b/docs/changelog/redesign-runtime-device-tracking.md
deleted file mode 100644
index cfb2bbee4..000000000
--- a/docs/changelog/redesign-runtime-device-tracking.md
+++ /dev/null
@@ -1,90 +0,0 @@
-# Redesign Runtime Device Tracking
-
-The device tracking infrastructure in VTK-m has been redesigned to
-remove multiple redundant codes paths and to simplify reasoning
-about around what an instance of RuntimeDeviceTracker will modify.
-
-`vtkm::cont::RuntimeDeviceTracker` tracks runtime information on
-a per-user thread basis. This is done to allow multiple calling
-threads to use different vtk-m backends such as seen in this
-example:
-
-```cpp
-  vtkm::cont::DeviceAdapterTagCuda cuda;
-  vtkm::cont::DeviceAdapterTagOpenMP openmp;
-  { // thread 1
-    auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
-    tracker->ForceDevice(cuda);
-    vtkm::worklet::Invoker invoke;
-    invoke(LightTask{}, input, output);
-    vtkm::cont::Algorithm::Sort(output);
-    invoke(HeavyTask{}, output);
-  }
-
- { // thread 2
-    auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
-    tracker->ForceDevice(openmp);
-    vtkm::worklet::Invoker invoke;
-    invoke(LightTask{}, input, output);
-    vtkm::cont::Algorithm::Sort(output);
-    invoke(HeavyTask{}, output);
-  }
-```
-
-While this address the ability for threads to specify what
-device they should run on. It doesn't make it easy to toggle
-the status of a device in a programmatic way, for example
-the following block forces execution to only occur on 
-`cuda` and doesn't restore previous active devices after 
-
-```cpp  
-  {
-  vtkm::cont::DeviceAdapterTagCuda cuda;
-  auto& tracker = vtkm::cont::GetRuntimeDeviceTracker();
-  tracker->ForceDevice(cuda);
-  vtkm::worklet::Invoker invoke;
-  invoke(LightTask{}, input, output);
-  } 
-  //openmp/tbb/... still inactive
-```
-
-To resolve those issues we have `vtkm::cont::ScopedRuntimeDeviceTracker` which
-has the same interface as `vtkm::cont::RuntimeDeviceTracker` but additionally
-resets any per-user thread modifications when it goes out of scope. So by
-switching over the previous example to use `ScopedRuntimeDeviceTracker` we
-correctly restore the threads `RuntimeDeviceTracker` state when `tracker`
-goes out of scope.
-```cpp  
-  {
-  vtkm::cont::DeviceAdapterTagCuda cuda;
-  vtkm::cont::ScopedRuntimeDeviceTracker tracker(cuda);
-  vtkm::worklet::Invoker invoke;
-  invoke(LightTask{}, input, output);
-  } 
-  //openmp/tbb/... are now again active
-```
-
-The  `vtkm::cont::ScopedRuntimeDeviceTracker` is not limited to forcing
-execution to occur on a single device. When constructed it can either force
-execution to a device, disable a device or enable a device. These options
-also work with the `DeviceAdapterTagAny`.
-
-
-```cpp  
-  {
-  //enable all devices 
-  vtkm::cont::DeviceAdapterTagAny any;
-  vtkm::cont::ScopedRuntimeDeviceTracker tracker(any, 
-                                                 vtkm::cont::RuntimeDeviceTrackerMode::Enable);
-  ...
-  }
-
-  {
-  //disable only cuda
-  vtkm::cont::DeviceAdapterTagCuda cuda;
-  vtkm::cont::ScopedRuntimeDeviceTracker tracker(cuda, 
-                                                 vtkm::cont::RuntimeDeviceTrackerMode::Disable);
-
-  ...
-  }
-```
diff --git a/docs/changelog/reduction-support-differing-input-output-types.md b/docs/changelog/reduction-support-differing-input-output-types.md
deleted file mode 100644
index 04cff5fae..000000000
--- a/docs/changelog/reduction-support-differing-input-output-types.md
+++ /dev/null
@@ -1,53 +0,0 @@
-# DeviceAdapter Reduction supports differing input and output types
-
-It is common to want to perform a reduction where the input and output types
-are of differing types. A basic example would be when the input is `vtkm::UInt8`
-but the output is `vtkm::UInt64`. This has been supported since v1.2, as the input
-type can be implicitly convertible to the output type.
-
-What we now support is when the input type is not implicitly convertible to the output type,
-such as when the output type is `vtkm::Pair< vtkm::UInt64, vtkm::UInt64>`. For this to work
-we require that the custom binary operator implements also an `operator()` which handles
-the unary transformation of input to output. 
-
-An example of a custom reduction operator for differing input and output types is:
-
-```cxx
-
-  struct CustomMinAndMax
-  {
-    using OutputType = vtkm::Pair<vtkm::Float64, vtkm::Float64>;
-
-    VTKM_EXEC_CONT
-    OutputType operator()(vtkm::Float64 a) const
-    {
-    return OutputType(a, a);
-    }
-
-    VTKM_EXEC_CONT
-    OutputType operator()(vtkm::Float64 a, vtkm::Float64 b) const
-    {
-      return OutputType(vtkm::Min(a, b), vtkm::Max(a, b));
-    }
-
-    VTKM_EXEC_CONT
-    OutputType operator()(const OutputType& a, const OutputType& b) const
-    {
-      return OutputType(vtkm::Min(a.first, b.first), vtkm::Max(a.second, b.second));
-    }
-
-    VTKM_EXEC_CONT
-    OutputType operator()(vtkm::Float64 a, const OutputType& b) const
-    {
-      return OutputType(vtkm::Min(a, b.first), vtkm::Max(a, b.second));
-    }
-
-    VTKM_EXEC_CONT
-    OutputType operator()(const OutputType& a, vtkm::Float64 b) const
-    {
-      return OutputType(vtkm::Min(a.first, b), vtkm::Max(a.second, b));
-    }
-  };
-
-
-```
diff --git a/docs/changelog/rename-per-thread-runtime-tracker-method.md b/docs/changelog/rename-per-thread-runtime-tracker-method.md
deleted file mode 100644
index c13f35b9f..000000000
--- a/docs/changelog/rename-per-thread-runtime-tracker-method.md
+++ /dev/null
@@ -1,9 +0,0 @@
-# Renamed RuntimeDeviceTrackers to use the term Global
-
-The `GetGlobalRuntimeDeviceTracker` never actually returned a process wide
-runtime device tracker but always a unique one for each control side thread.
-This was the design as it would allow for different threads to have different
-runtime device settings.
-
-By removing the term Global from the name it becomes more clear what scope this
-class has.
diff --git a/docs/changelog/specialize-worklet-for-device.md b/docs/changelog/specialize-worklet-for-device.md
deleted file mode 100644
index dd74adbca..000000000
--- a/docs/changelog/specialize-worklet-for-device.md
+++ /dev/null
@@ -1,147 +0,0 @@
-# Add ability to specialize a worklet for a device
-
-This change adds an execution signature tag named `Device` that passes
-a `DeviceAdapterTag` to the worklet's parenthesis operator. This allows the
-worklet to specialize its operation. This features is available in all
-worklets.
-
-The following example shows a worklet that specializes itself for the CUDA
-device.
-
-```cpp
-struct DeviceSpecificWorklet : vtkm::worklet::WorkletMapField
-{
-  using ControlSignature = void(FieldIn, FieldOut);
-  using ExecutionSignature = _2(_1, Device);
-  
-  // Specialization for the Cuda device.
-  template <typename T>
-  T operator()(T x, vtkm::cont::DeviceAdapterTagCuda) const
-  {
-    // Special cuda implementation
-  }
-  
-  // General implementation
-  template <typename T, typename Device>
-  T operator()(T x, Device) const
-  {
-    // General implementation
-  }
-};
-```
-
-## Effect on compile time and binary size
-
-This change necessitated adding a template parameter for the device that
-followed at least from the schedule all the way down. This has the
-potential for duplicating several of the support methods (like
-`DoWorkletInvokeFunctor`) that would otherwise have the same type. This is
-especially true between the devices that run on the CPU as they should all
-be sharing the same portals from `ArrayHandle`s. So the question is whether
-it causes compile to take longer or cause a significant increase in
-binaries.
-
-To informally test, I first ran a clean debug compile on my Windows machine
-with the serial and tbb devices. The build itself took **3 minutes, 50
-seconds**. Here is a list of the binary sizes in the bin directory:
-
-```
-kmorel2 0> du -sh *.exe *.dll
-200K    BenchmarkArrayTransfer_SERIAL.exe
-204K    BenchmarkArrayTransfer_TBB.exe
-424K    BenchmarkAtomicArray_SERIAL.exe
-424K    BenchmarkAtomicArray_TBB.exe
-440K    BenchmarkCopySpeeds_SERIAL.exe
-580K    BenchmarkCopySpeeds_TBB.exe
-4.1M    BenchmarkDeviceAdapter_SERIAL.exe
-5.3M    BenchmarkDeviceAdapter_TBB.exe
-7.9M    BenchmarkFieldAlgorithms_SERIAL.exe
-7.9M    BenchmarkFieldAlgorithms_TBB.exe
-22M     BenchmarkFilters_SERIAL.exe
-22M     BenchmarkFilters_TBB.exe
-276K    BenchmarkRayTracing_SERIAL.exe
-276K    BenchmarkRayTracing_TBB.exe
-4.4M    BenchmarkTopologyAlgorithms_SERIAL.exe
-4.4M    BenchmarkTopologyAlgorithms_TBB.exe
-712K    Rendering_SERIAL.exe
-712K    Rendering_TBB.exe
-708K    UnitTests_vtkm_cont_arg_testing.exe
-1.7M    UnitTests_vtkm_cont_internal_testing.exe
-13M     UnitTests_vtkm_cont_serial_testing.exe
-14M     UnitTests_vtkm_cont_tbb_testing.exe
-18M     UnitTests_vtkm_cont_testing.exe
-13M     UnitTests_vtkm_cont_testing_mpi.exe
-736K    UnitTests_vtkm_exec_arg_testing.exe
-136K    UnitTests_vtkm_exec_internal_testing.exe
-196K    UnitTests_vtkm_exec_serial_internal_testing.exe
-196K    UnitTests_vtkm_exec_tbb_internal_testing.exe
-2.0M    UnitTests_vtkm_exec_testing.exe
-83M     UnitTests_vtkm_filter_testing.exe
-476K    UnitTests_vtkm_internal_testing.exe
-148K    UnitTests_vtkm_interop_internal_testing.exe
-1.3M    UnitTests_vtkm_interop_testing.exe
-2.9M    UnitTests_vtkm_io_reader_testing.exe
-548K    UnitTests_vtkm_io_writer_testing.exe
-792K    UnitTests_vtkm_rendering_testing.exe
-3.7M    UnitTests_vtkm_testing.exe
-320K    UnitTests_vtkm_worklet_internal_testing.exe
-65M     UnitTests_vtkm_worklet_testing.exe
-11M     vtkm_cont-1.3.dll
-2.1M    vtkm_interop-1.3.dll
-21M     vtkm_rendering-1.3.dll
-3.9M    vtkm_worklet-1.3.dll
-```
-
-After making the singular change to the `Invocation` object to add the
-`DeviceAdapterTag` as a template parameter (which should cause any extra
-compile instances) the compile took **4 minuts and 5 seconds**. Here is the
-new list of binaries.
-
-```
-kmorel2 0> du -sh *.exe *.dll
-200K    BenchmarkArrayTransfer_SERIAL.exe
-204K    BenchmarkArrayTransfer_TBB.exe
-424K    BenchmarkAtomicArray_SERIAL.exe
-424K    BenchmarkAtomicArray_TBB.exe
-440K    BenchmarkCopySpeeds_SERIAL.exe
-580K    BenchmarkCopySpeeds_TBB.exe
-4.1M    BenchmarkDeviceAdapter_SERIAL.exe
-5.3M    BenchmarkDeviceAdapter_TBB.exe
-7.9M    BenchmarkFieldAlgorithms_SERIAL.exe
-7.9M    BenchmarkFieldAlgorithms_TBB.exe
-22M     BenchmarkFilters_SERIAL.exe
-22M     BenchmarkFilters_TBB.exe
-276K    BenchmarkRayTracing_SERIAL.exe
-276K    BenchmarkRayTracing_TBB.exe
-4.4M    BenchmarkTopologyAlgorithms_SERIAL.exe
-4.4M    BenchmarkTopologyAlgorithms_TBB.exe
-712K    Rendering_SERIAL.exe
-712K    Rendering_TBB.exe
-708K    UnitTests_vtkm_cont_arg_testing.exe
-1.7M    UnitTests_vtkm_cont_internal_testing.exe
-13M     UnitTests_vtkm_cont_serial_testing.exe
-14M     UnitTests_vtkm_cont_tbb_testing.exe
-19M     UnitTests_vtkm_cont_testing.exe
-13M     UnitTests_vtkm_cont_testing_mpi.exe
-736K    UnitTests_vtkm_exec_arg_testing.exe
-136K    UnitTests_vtkm_exec_internal_testing.exe
-196K    UnitTests_vtkm_exec_serial_internal_testing.exe
-196K    UnitTests_vtkm_exec_tbb_internal_testing.exe
-2.0M    UnitTests_vtkm_exec_testing.exe
-86M     UnitTests_vtkm_filter_testing.exe
-476K    UnitTests_vtkm_internal_testing.exe
-148K    UnitTests_vtkm_interop_internal_testing.exe
-1.3M    UnitTests_vtkm_interop_testing.exe
-2.9M    UnitTests_vtkm_io_reader_testing.exe
-548K    UnitTests_vtkm_io_writer_testing.exe
-792K    UnitTests_vtkm_rendering_testing.exe
-3.7M    UnitTests_vtkm_testing.exe
-320K    UnitTests_vtkm_worklet_internal_testing.exe
-68M     UnitTests_vtkm_worklet_testing.exe
-11M     vtkm_cont-1.3.dll
-2.1M    vtkm_interop-1.3.dll
-21M     vtkm_rendering-1.3.dll
-3.9M    vtkm_worklet-1.3.dll
-```
-
-So far the increase is quite negligible.
diff --git a/docs/changelog/update-CellLocatorTwoLevelUniformGrid.md b/docs/changelog/update-CellLocatorTwoLevelUniformGrid.md
deleted file mode 100644
index d9523417a..000000000
--- a/docs/changelog/update-CellLocatorTwoLevelUniformGrid.md
+++ /dev/null
@@ -1,31 +0,0 @@
-# update-CellLocatorTwoLevelUniformGrid
-
-`CellLocatorTwoLevelUniformGrid` has been renamed to `CellLocatorUniformBins`
-for brevity. It has been modified to be a subclass of `vtkm::cont::CellLocator`
-and can be used wherever a `CellLocator` is accepted.
-
-`CellLocatorUniformBins` can work with all kinds of datasets, but there are cell
-locators that are more efficient for specific data sets. Therefore, a new cell
-locator - `CellLocatorGeneral` has been implemented that can be configured to use
-specialized cell locators based on its input data. A "configurator" function object
-can be specified using the `SetConfigurator` function. The configurator should
-have the following signature:
-
-```c++
-void (std::unique_ptr<vtkm::cont::CellLocator>&,
-     const vtkm::cont::DynamicCellSet&,
-     const vtkm::cont::CoordinateSystem&);
-```
-
-The configurator is invoked whenever the `Update` method is called and the input
-has changed. The current cell locator is passed in a `std::unique_ptr`. Based on
-the types of the input cellset and coordinates, and possibly some heuristics on
-their values, the current cell locator's parameters can be updated, or a different
-cell-locator can be instantiated and transferred to the `unique_ptr`. The default
-configurator configures a `CellLocatorUniformGrid` for uniform grid datasets,
-a `CellLocatorRecitlinearGrid` for rectilinear datasets, and `CellLocatorUniformBins`
-for all other dataset types.
-
-The class `CellLocatorHelper` that implemented similar functionality to
-`CellLocatorGeneral` has been removed.
-
diff --git a/docs/changelog/update-optional-parser.md b/docs/changelog/update-optional-parser.md
deleted file mode 100644
index b2bbad6c4..000000000
--- a/docs/changelog/update-optional-parser.md
+++ /dev/null
@@ -1,3 +0,0 @@
-# Optional Parser is bumped from 1.3 to 1.7.
-
-VTK-m internal version of Optional Parser has been moved to 1.7
diff --git a/docs/changelog/variantarrayhandle.md b/docs/changelog/variantarrayhandle.md
deleted file mode 100644
index d6466027f..000000000
--- a/docs/changelog/variantarrayhandle.md
+++ /dev/null
@@ -1,43 +0,0 @@
-# vtkm::cont::VariantArrayHandle replaces vtkm::cont::DynamicArrayHandle
-
-`ArrayHandleVariant` replaces `DynamicArrayHandle` as the primary method
-for holding onto a type erased `vtkm::cont::ArrayHandle`. The major difference
-between the two implementations is how they handle the Storage component of
-an array handle.
-
-`DynamicArrayHandle` approach was to find the fully deduced type of the `ArrayHandle`
-meaning it would check all value and storage types it knew about until it found a match.
-This cross product of values and storages would cause significant compilation times when
-a `DynamicArrayHandle` had multiple storage types.
-
-`VariantArrayHandle` approach is to only deduce the value type of the `ArrayHandle` and
-return a `vtkm::cont::ArrayHandleVirtual` which uses polymorpishm to hide the actual
-storage type. This approach allows for better compile times, and for calling code
-to always expect an `ArrayHandleVirtual` instead of the fully deduced type. This conversion
-to `ArrayHandleVirtual` is usually done internally within VTK-m when a  worklet or filter
-is invoked.
-
-In certain cases users of `VariantArrayHandle` want to be able to access the concrete 
-`ArrayHandle<T,S>` and not have it wrapped in a `ArrayHandleVirtual`. For those occurrences
-`VariantArrayHandle` provides a collection of helper functions/methods to query and
-cast back to the concrete storage and value type:
-```cpp
-vtkm::cont::ArrayHandleConstant<vtkm::Float32> constant(42.0f);
-vtkm::cont::ArrayHandleVariant v(constant);
-
-bool isConstant = vtkm::cont::IsType< decltype(constant) >(v);
-if(isConstant)
-  vtkm::cont::ArrayHandleConstant<vtkm::Float32> t = vtkm::cont::Cast< decltype(constant) >(v);
-
-```
-
-Lastly, a common operation of calling code using `VariantArrayHandle` is a desire to construct a new instance
-of an existing virtual handle with the same storage type. This can be done by using the `NewInstance` method
-as seen below
-```cpp
-vtkm::cont::ArrayHandle<vtkm::Float32> pressure;
-vtkm::cont::ArrayHandleVariant v(pressure);
-
-vtkm::cont::ArrayHandleVariant newArray = v->NewInstance();
-bool isConstant = vtkm::cont::IsType< decltype(pressure) >(newArray); //will be true
-```
diff --git a/docs/changelog/verify-cmake-install.md b/docs/changelog/verify-cmake-install.md
deleted file mode 100644
index c8b0dad5c..000000000
--- a/docs/changelog/verify-cmake-install.md
+++ /dev/null
@@ -1,9 +0,0 @@
-# VTK-m now can verify that it installs itself correctly
-
-It was a fairly common occurrence of VTK-m to have a broken install
-tree as it had no easy way to verify that all headers would be installed.
-
-Now VTK-m offers a testing infrastructure that creates a temporary installed
-version and is able to run tests with that VTK-m installed version. Currently
-the only test is to verify that each header listed in VTK-m is also installed,
-but this can expand in the future to include compilation tests.
diff --git a/docs/changelog/vtkm-handles-busy-cuda-devices-better.md b/docs/changelog/vtkm-handles-busy-cuda-devices-better.md
deleted file mode 100644
index 13a86ffa1..000000000
--- a/docs/changelog/vtkm-handles-busy-cuda-devices-better.md
+++ /dev/null
@@ -1,17 +0,0 @@
-# VTK-m CUDA detection properly handles busy devices
-
-When an application that uses VTK-m is first launched it will
-do a check to see if CUDA is supported at runtime. If for
-some reason that CUDA card is not allowing kernel execution
-VTK-m would report the hardware doesn't have CUDA support.
-
-This was problematic as was over aggressive in disabling CUDA
-support for hardware that could support kernel execution in
-the future. With the fact that every VTK-m worklet is executed
-through a TryExecute it is no longer necessary to be so
-aggressive in disabling CUDA support.
-
-Now the behavior is that VTK-m considers a machine to have
-CUDA runtime support if it has 1+ GPU's of Kepler or
-higher hardware (SM_30+).
-
diff --git a/docs/changelog/vtkm-mangle-diy.md b/docs/changelog/vtkm-mangle-diy.md
deleted file mode 100644
index c72172a82..000000000
--- a/docs/changelog/vtkm-mangle-diy.md
+++ /dev/null
@@ -1,11 +0,0 @@
-# VTK-m thirdparty diy now can coexist with external diy
-
-Previously VTK-m would leak macros that would cause an
-external diy to be incorrectly mangled breaking consumers
-of VTK-m that used diy.
-
-Going forward to use `diy` from VTK-m all calls must use the
-`vtkmdiy` namespace instead of the `diy` namespace. This
-allows for VTK-m to properly forward calls to either
-the external or internal version correctly.
-

From 903c2604df0bd31052c05f446a97770b4f09c4c5 Mon Sep 17 00:00:00 2001
From: Robert Maynard <robert.maynard@kitware.com>
Date: Wed, 26 Jun 2019 12:19:53 -0400
Subject: [PATCH 8/8] Release VTK-m 1.4.0

1.4.0 is our fifth official release of VTK-m.
The major changes to VTK-m from 1.3.0 can be found in:
  docs/changelog/1.4/release-notes.md
---
 version.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/version.txt b/version.txt
index f0bb29e76..88c5fb891 100644
--- a/version.txt
+++ b/version.txt
@@ -1 +1 @@
-1.3.0
+1.4.0