diy: remove in order to reset the import

2024-09-08 13:23:51 +00:00 · 2019-08-22 14:04:03 -04:00 · 2019-08-22 14:04:03 -04:00 · c0b497cc83
commit c0b497cc83
parent 0586525b74
67 changed files with 0 additions and 15612 deletions
--- a/vtkm/thirdparty/diy/vtkmdiy/LEGAL.txt
+++ b/vtkm/thirdparty/diy/vtkmdiy/LEGAL.txt
@ -1,19 +0,0 @@
-Copyright Notice
-
-DIY2, Copyright (c) 2015, The Regents of the University of California, through
-Lawrence Berkeley National Laboratory (subject to receipt of any required
-approvals from the U.S. Dept. of Energy).  All rights reserved.
-
-If you have questions about your rights to use or distribute this software,
-please contact Berkeley Lab's Technology Transfer Department at  TTD@lbl.gov.
-
-NOTICE.  This software is owned by the U.S. Department of Energy.  As such, the
-U.S. Government has been granted for itself and others acting on its behalf a
-paid-up, nonexclusive, irrevocable, worldwide license in the Software to
-reproduce, prepare derivative works, and perform publicly and display publicly.
-Beginning five (5) years after the date permission to assert copyright is
-obtained from the U.S. Department of Energy, and subject to any subsequent five
-(5) year renewals, the U.S. Government is granted for itself and others acting
-on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the
-Software to reproduce, prepare derivative works, distribute copies to the
-public, perform publicly and display publicly, and to permit others to do so.
--- a/vtkm/thirdparty/diy/vtkmdiy/LICENSE.txt
+++ b/vtkm/thirdparty/diy/vtkmdiy/LICENSE.txt
@ -1,41 +0,0 @@
-License Agreement
-
-"DIY2, Copyright (c) 2015, The Regents of the University of California, through
-Lawrence Berkeley National Laboratory (subject to receipt of any required
-approvals from the U.S. Dept. of Energy).  All rights reserved."
-
-Redistribution and use in source and binary forms, with or without modification,
-are permitted provided that the following conditions are met:
-
-(1) Redistributions of source code must retain the above copyright notice, this
-list of conditions and the following disclaimer.
-
-(2) Redistributions in binary form must reproduce the above copyright notice,
-this list of conditions and the following disclaimer in the documentation and/or
-other materials provided with the distribution.
-
-(3) Neither the name of the University of California, Lawrence Berkeley National
-Laboratory, U.S. Dept. of Energy nor the names of its contributors may be used
-to endorse or promote products derived from this software without specific prior
-written permission.
-
-THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
-ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
-WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
-DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
-ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
-(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
-LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
-ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
-(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
-SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
-
-You are under no obligation whatsoever to provide any bug fixes, patches, or
-upgrades to the features, functionality or performance of the source code
-("Enhancements") to anyone; however, if you choose to make your Enhancements
-available either publicly, or directly to Lawrence Berkeley National Laboratory,
-without imposing a separate written license agreement for such Enhancements,
-then you hereby grant the following license: a  non-exclusive, royalty-free
-perpetual license to install, use, modify, prepare derivative works, incorporate
-into other computer software, distribute, and sublicense such enhancements or
-derivative works thereof, in binary and source code form.
--- a/vtkm/thirdparty/diy/vtkmdiy/README.md
+++ b/vtkm/thirdparty/diy/vtkmdiy/README.md
@ -1,85 +0,0 @@
-## DIY is a block-parallel library
-
-DIY is a block-parallel library for implementing scalable algorithms that can execute both
-in-core and out-of-core. The same program can be executed with one or more threads per MPI
-process, seamlessly combining distributed-memory message passing with shared-memory thread
-parallelism.  The abstraction enabling these capabilities is block parallelism; blocks
-and their message queues are mapped onto processing elements (MPI processes or threads) and are
-migrated between memory and storage by the DIY runtime. Complex communication patterns,
-including neighbor exchange, merge reduction, swap reduction, and all-to-all exchange, are
-possible in- and out-of-core in DIY.
-
-## Licensing
-
-DIY is released as open source software under a BSD-style [license](./LICENSE.txt).
-
-## Dependencies
-
-DIY requires an MPI installation. We recommend [MPICH](http://www.mpich.org/).
-
-## Download, build, install
-
- You can clone this repository, or
-
- You can download the [latest tarball](https://github.com/diatomic/diy2/archive/master.tar.gz).
-
-
-DIY is a header-only library. It does not need to be built; you can simply
-include it in your project. The examples can be built using `cmake` from the
-top-level directory.
-
-## Documentation
-
-[Doxygen pages](https://diatomic.github.io/diy)
-
-## Example
-
-A simple DIY program, shown below, consists of the following components:
-
- `struct`s called blocks,
- a diy object called the `master`,
- a set of callback functions performed on each block by `master.foreach()`,
- optionally, one or more message exchanges between the blocks by `master.exchange()`, and
- there may be other collectives and global reductions not shown below.
-
-The callback functions (`enqueue_local()` and `average()` in the example below) receive the block
-pointer and a communication proxy for the message exchange between blocks. It is usual for the
-callback functions to enqueue or dequeue messages from the proxy, so that information can be
-received and sent during rounds of message exchange.
-
-```cpp
-    // --- main program --- //
-
-    struct Block { float local, average; };             // define your block structure
-
-    Master master(world);                               // world = MPI_Comm
-    ...                                                 // populate master with blocks
-    master.foreach(&enqueue_local);                     // call enqueue_local() for each block
-    master.exchange();                                  // exchange enqueued data between blocks
-    master.foreach(&average);                           // call average() for each block
-
-    // --- callback functions --- //
-
-    // enqueue block data prior to exchanging it
-    void enqueue_local(Block* b,                        // current block
-                       const Proxy& cp)                 // communication proxy provides access to the neighbor blocks
-    {
-        for (size_t i = 0; i < cp.link()->size(); i++)  // for all neighbor blocks
-            cp.enqueue(cp.link()->target(i), b->local); // enqueue the data to be sent to this neighbor
-                                                        // block in the next exchange
-    }
-
-    // use the received data after exchanging it, in this case compute its average
-    void average(Block* b,                              // current block
-                 const Proxy& cp)                       // communication proxy provides access to the neighbor blocks
-    {
-        float x, average = 0;
-        for (size_t i = 0; i < cp.link()->size(); i++)  // for all neighbor blocks
-        {
-            cp.dequeue(cp.link()->target(i).gid, x);    // dequeue the data received from this
-                                                        // neighbor block in the last exchange
-            average += x;
-        }
-        b->average = average / cp.link()->size();
-    }
-```
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/algorithms.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/algorithms.hpp
@ -1,191 +0,0 @@
-#ifndef VTKMDIY_ALGORITHMS_HPP
-#define VTKMDIY_ALGORITHMS_HPP
-
-#include <vector>
-
-#include "master.hpp"
-#include "assigner.hpp"
-#include "reduce.hpp"
-#include "reduce-operations.hpp"
-#include "partners/swap.hpp"
-
-#include "detail/algorithms/sort.hpp"
-#include "detail/algorithms/kdtree.hpp"
-#include "detail/algorithms/kdtree-sampling.hpp"
-
-#include "log.hpp"
-
-namespace diy
-{
-    /**
-     * \ingroup Algorithms
-     * \brief sample sort `values` of each block, store the boundaries between blocks in `samples`
-     */
-    template<class Block, class T, class Cmp>
-    void sort(Master&                   master,               //!< master object
-              const Assigner&           assigner,             //!< assigner object
-              std::vector<T> Block::*   values,               //!< all values to sort
-              std::vector<T> Block::*   samples,              //!< (output) boundaries of blocks
-              size_t                    num_samples,          //!< desired number of samples
-              const Cmp&                cmp,                  //!< comparison function
-              int                       k   = 2,              //!< k-ary reduction will be used
-              bool                      samples_only = false) //!< false: results will be all_to_all exchanged; true: only sort but don't exchange results
-    {
-        bool immediate = master.immediate();
-        master.set_immediate(false);
-
-        // NB: although sorter will go out of scope, its member functions sample()
-        //     and exchange() will return functors whose copies get saved inside reduce
-        detail::SampleSort<Block,T,Cmp> sorter(values, samples, cmp, num_samples);
-
-        // swap-reduce to all-gather samples
-        RegularDecomposer<DiscreteBounds> decomposer(1, interval(0,assigner.nblocks()), assigner.nblocks());
-        RegularSwapPartners   partners(decomposer, k);
-        reduce(master, assigner, partners, sorter.sample(), detail::SkipIntermediate(partners.rounds()));
-
-        // all_to_all to exchange the values
-        if (!samples_only)
-            all_to_all(master, assigner, sorter.exchange(), k);
-
-        master.set_immediate(immediate);
-    }
-
-
-    /**
-     * \ingroup Algorithms
-     * \brief sample sort `values` of each block, store the boundaries between blocks in `samples`
-     * shorter version of above sort algorithm with the default less-than comparator used for T
-     * and all_to_all exchange included
-     */
-    template<class Block, class T>
-    void sort(Master&                   master,      //!< master object
-              const Assigner&           assigner,    //!< assigner object
-              std::vector<T> Block::*   values,      //!< all values to sort
-              std::vector<T> Block::*   samples,     //!< (output) boundaries of blocks
-              size_t                    num_samples, //!< desired number of samples
-              int                       k   = 2)     //!< k-ary reduction will be used
-    {
-        sort(master, assigner, values, samples, num_samples, std::less<T>(), k);
-    }
-
-    /**
-     * \ingroup Algorithms
-     * \brief build a kd-tree and sort a set of points into it (use histograms to determine split values)
-     */
-    template<class Block, class Point>
-    void kdtree(Master&                         master,      //!< master object
-                const Assigner&                 assigner,    //!< assigner object
-                int                             dim,         //!< dimensionality
-                const ContinuousBounds&         domain,      //!< global data extents
-                std::vector<Point>  Block::*    points,      //!< input points to sort into kd-tree
-                size_t                          bins,        //!< number of histogram bins for splitting a dimension
-                bool                            wrap = false)//!< periodic boundaries in all dimensions
-    {
-        if (assigner.nblocks() & (assigner.nblocks() - 1))
-            throw std::runtime_error(fmt::format("KD-tree requires a number of blocks that's a power of 2, got {}", assigner.nblocks()));
-
-        typedef     diy::RegularContinuousLink      RCLink;
-
-        for (size_t i = 0; i < master.size(); ++i)
-        {
-            RCLink* link   = static_cast<RCLink*>(master.link(i));
-            *link = RCLink(dim, domain, domain);
-
-            if (wrap)       // set up the links to self
-            {
-                diy::BlockID self = { master.gid(i), master.communicator().rank() };
-                for (int j = 0; j < dim; ++j)
-                {
-                    diy::Direction dir, wrap_dir;
-
-                    // left
-                    dir[j] = -1; wrap_dir[j] = -1;
-                    link->add_neighbor(self);
-                    link->add_bounds(domain);
-                    link->add_direction(dir);
-                    link->add_wrap(wrap_dir);
-
-                    // right
-                    dir[j] = 1; wrap_dir[j] = 1;
-                    link->add_neighbor(self);
-                    link->add_bounds(domain);
-                    link->add_direction(dir);
-                    link->add_wrap(wrap_dir);
-                }
-            }
-        }
-
-        detail::KDTreePartition<Block,Point>    kdtree_partition(dim, points, bins);
-
-        detail::KDTreePartners                  partners(dim, assigner.nblocks(), wrap, domain);
-        reduce(master, assigner, partners, kdtree_partition);
-
-        // update master.expected to match the links
-        int expected = 0;
-        for (size_t i = 0; i < master.size(); ++i)
-            expected += master.link(i)->size_unique();
-        master.set_expected(expected);
-    }
-
-    /**
-     * \ingroup Algorithms
-     * \brief build a kd-tree and sort a set of points into it (use sampling to determine split values)
-     */
-    template<class Block, class Point>
-    void kdtree_sampling
-               (Master&                         master,      //!< master object
-                const Assigner&                 assigner,    //!< assigner object
-                int                             dim,         //!< dimensionality
-                const ContinuousBounds&         domain,      //!< global data extents
-                std::vector<Point>  Block::*    points,      //!< input points to sort into kd-tree
-                size_t                          samples,     //!< number of samples to take in each block
-                bool                            wrap = false)//!< periodic boundaries in all dimensions
-    {
-        if (assigner.nblocks() & (assigner.nblocks() - 1))
-            throw std::runtime_error(fmt::format("KD-tree requires a number of blocks that's a power of 2, got {}", assigner.nblocks()));
-
-        typedef     diy::RegularContinuousLink      RCLink;
-
-        for (size_t i = 0; i < master.size(); ++i)
-        {
-            RCLink* link   = static_cast<RCLink*>(master.link(i));
-            *link = RCLink(dim, domain, domain);
-
-            if (wrap)       // set up the links to self
-            {
-                diy::BlockID self = { master.gid(i), master.communicator().rank() };
-                for (int j = 0; j < dim; ++j)
-                {
-                    diy::Direction dir, wrap_dir;
-
-                    // left
-                    dir[j] = -1; wrap_dir[j] = -1;
-                    link->add_neighbor(self);
-                    link->add_bounds(domain);
-                    link->add_direction(dir);
-                    link->add_wrap(wrap_dir);
-
-                    // right
-                    dir[j] = 1; wrap_dir[j] = 1;
-                    link->add_neighbor(self);
-                    link->add_bounds(domain);
-                    link->add_direction(dir);
-                    link->add_wrap(wrap_dir);
-                }
-            }
-        }
-
-        detail::KDTreeSamplingPartition<Block,Point>    kdtree_partition(dim, points, samples);
-
-        detail::KDTreePartners                          partners(dim, assigner.nblocks(), wrap, domain);
-        reduce(master, assigner, partners, kdtree_partition);
-
-        // update master.expected to match the links
-        int expected = 0;
-        for (size_t i = 0; i < master.size(); ++i)
-            expected += master.link(i)->size_unique();
-        master.set_expected(expected);
-    }
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/assigner.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/assigner.hpp
@ -1,264 +0,0 @@
-#ifndef VTKMDIY_ASSIGNER_HPP
-#define VTKMDIY_ASSIGNER_HPP
-
-#include <vector>
-#include <tuple>
-
-#include "mpi.hpp"      // needed for DynamicAssigner
-
-namespace diy
-{
-  // Derived types should define
-  //   int rank(int gid) const
-  // that converts a global block id to a rank that it's assigned to.
-  class Assigner
-  {
-    public:
-     /**
-      * \ingroup Assignment
-      * \brief Manages how blocks are assigned to processes
-      */
-                    Assigner(int size__,     //!< total number of processes
-                             int nblocks__   //!< total (global) number of blocks
-                             ):
-                      size_(size__), nblocks_(nblocks__)  {}
-
-      //! returns the total number of process ranks
-      int           size() const                        { return size_; }
-      //! returns the total number of global blocks
-      int           nblocks() const                     { return nblocks_; }
-      //! sets the total number of global blocks
-      virtual void  set_nblocks(int nblocks__)          { nblocks_ = nblocks__; }
-      //! returns the process rank of the block with global id gid (need not be local)
-      virtual int   rank(int gid) const     =0;
-      //! batch lookup; returns the process ranks of the blocks with global id in the vector gids
-      inline
-      virtual std::vector<int>
-                    ranks(const std::vector<int>& gids) const;
-
-    private:
-      int           size_;      // total number of ranks
-      int           nblocks_;   // total number of blocks
-  };
-
-  class StaticAssigner: public Assigner
-  {
-    public:
-     /**
-      * \ingroup Assignment
-      * \brief Intermediate type to express assignment that cannot change; adds `local_gids` query method
-      */
-      using Assigner::Assigner;
-
-      //! gets the local gids for a given process rank
-      virtual void  local_gids(int rank, std::vector<int>& gids) const   =0;
-  };
-
-  class ContiguousAssigner: public StaticAssigner
-  {
-    public:
-     /**
-      * \ingroup Assignment
-      * \brief Assigns blocks to processes in contiguous gid (block global id) order
-      */
-      using StaticAssigner::StaticAssigner;
-
-      using StaticAssigner::size;
-      using StaticAssigner::nblocks;
-
-      int   rank(int gid) const override
-      {
-          int div = nblocks() / size();
-          int mod = nblocks() % size();
-          int r = gid / (div + 1);
-          if (r < mod)
-          {
-              return r;
-          } else
-          {
-              return mod + (gid - (div + 1)*mod)/div;
-          }
-      }
-      inline
-      void  local_gids(int rank, std::vector<int>& gids) const override;
-  };
-
-  class RoundRobinAssigner: public StaticAssigner
-  {
-    public:
-     /**
-      * \ingroup Assignment
-      * \brief Assigns blocks to processes in cyclic or round-robin gid (block global id) order
-      */
-      using StaticAssigner::StaticAssigner;
-
-      using StaticAssigner::size;
-      using StaticAssigner::nblocks;
-
-      int   rank(int gid) const override        { return gid % size(); }
-      inline
-      void  local_gids(int rank, std::vector<int>& gids) const override;
-  };
-
-  class DynamicAssigner: public Assigner
-  {
-    public:
-                    DynamicAssigner(const mpi::communicator& comm, int size__, int nblocks__):
-                      Assigner(size__, nblocks__),
-                      comm_(comm),
-                      div_(nblocks__ / size__ + ((nblocks__ % size__) == 0 ? 0 : 1)),   // NB: same size window everywhere means the last rank may allocate extra space
-                      rank_map_(comm_, div_)                                            { rank_map_.lock_all(MPI_MODE_NOCHECK); }
-                    ~DynamicAssigner()                                                  { rank_map_.unlock_all(); }
-
-      inline
-      virtual void  set_nblocks(int nblocks__) override;
-      inline
-      virtual int   rank(int gid) const override;
-      inline
-      virtual std::vector<int>
-                    ranks(const std::vector<int>& gids) const override;
-
-      inline std::tuple<bool,int>
-                    get_rank(int& rk, int gid) const;
-
-      inline void   set_rank(const int& rk, int gid, bool flush = true);
-      inline void   set_ranks(const std::vector<std::tuple<int,int>>& rank_gids);
-
-      std::tuple<int,int>
-                    rank_offset(int gid) const                                          { return std::make_tuple(gid / div_, gid % div_); }
-
-    private:
-      mpi::communicator         comm_;
-      int                       div_;
-      mutable mpi::window<int>  rank_map_;
-  };
-}
-
-std::vector<int>
-diy::Assigner::
-ranks(const std::vector<int>& gids) const
-{
-    std::vector<int> result(gids.size());
-    for (size_t i = 0; i < gids.size(); ++i)
-        result[i] = rank(gids[i]);
-    return result;
-}
-
-void
-diy::ContiguousAssigner::
-local_gids(int rank_, std::vector<int>& gids) const
-{
-  int div = nblocks() / size();
-  int mod = nblocks() % size();
-
-  int from, to;
-  if (rank_ < mod)
-      from = rank_ * (div + 1);
-  else
-      from = mod * (div + 1) + (rank_ - mod) * div;
-
-  if (rank_ + 1 < mod)
-      to = (rank_ + 1) * (div + 1);
-  else
-      to = mod * (div + 1) + (rank_ + 1 - mod) * div;
-
-  for (int gid = from; gid < to; ++gid)
-    gids.push_back(gid);
-}
-
-void
-diy::RoundRobinAssigner::
-local_gids(int rank_, std::vector<int>& gids) const
-{
-  int cur = rank_;
-  while (cur < nblocks())
-  {
-    gids.push_back(cur);
-    cur += size();
-  }
-}
-
-void
-diy::DynamicAssigner::
-set_nblocks(int nblocks__)
-{
-    Assigner::set_nblocks(nblocks__);
-    div_ = nblocks() / size() + ((nblocks() % size()) == 0 ? 0 : 1);
-
-    rank_map_.unlock_all();
-    rank_map_ = mpi::window<int>(comm_, div_);
-    rank_map_.lock_all(MPI_MODE_NOCHECK);
-}
-
-std::tuple<bool,int>
-diy::DynamicAssigner::
-get_rank(int& rk, int gid) const
-{
-    // TODO: check if the gid is in cache
-
-    int r,offset;
-    std::tie(r,offset) = rank_offset(gid);
-
-    rank_map_.get(rk, r, offset);
-
-    return std::make_tuple(false, r);        // false indicates that the data wasn't read from cache
-}
-
-int
-diy::DynamicAssigner::
-rank(int gid) const
-{
-    int rk;
-    auto cached_gidrk = get_rank(rk, gid);
-    int gidrk = std::get<1>(cached_gidrk);
-
-    rank_map_.flush_local(gidrk);
-
-    return rk;
-}
-
-
-std::vector<int>
-diy::DynamicAssigner::
-ranks(const std::vector<int>& gids) const
-{
-    bool all_cached = true;
-    std::vector<int> result(gids.size(), -1);
-    for (size_t i = 0; i < gids.size(); ++i)
-    {
-        auto cached_gidrk = get_rank(result[i], gids[i]);
-        bool cached = std::get<0>(cached_gidrk);
-        all_cached  &= cached;
-    }
-
-    if (!all_cached)
-        rank_map_.flush_local_all();
-
-    return result;
-}
-
-void
-diy::DynamicAssigner::
-set_rank(const int& rk, int gid, bool flush)
-{
-    // TODO: update cache
-
-    int r,offset;
-    std::tie(r,offset) = rank_offset(gid);
-
-    rank_map_.put(rk, r, offset);
-
-    if (flush)
-        rank_map_.flush(r);
-}
-
-void
-diy::DynamicAssigner::
-set_ranks(const std::vector<std::tuple<int,int>>& rank_gids)
-{
-    for (auto& rg : rank_gids)
-        set_rank(std::get<0>(rg), std::get<1>(rg), false);
-    rank_map_.flush_all();
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/collection.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/collection.hpp
@ -1,121 +0,0 @@
-#ifndef VTKMDIY_COLLECTION_HPP
-#define VTKMDIY_COLLECTION_HPP
-
-#include <vector>
-
-#include "serialization.hpp"
-#include "storage.hpp"
-#include "thread.hpp"
-
-
-namespace diy
-{
-  class Collection
-  {
-    public:
-      typedef       void*                                       Element;
-      typedef       std::vector<Element>                        Elements;
-      typedef       critical_resource<int, recursive_mutex>     CInt;
-
-      typedef       void* (*Create)();
-      typedef       void  (*Destroy)(void*);
-      typedef       detail::Save                                Save;
-      typedef       detail::Load                                Load;
-
-    public:
-                    Collection(Create               create__,
-                               Destroy              destroy__,
-                               ExternalStorage*     storage__,
-                               Save                 save__,
-                               Load                 load__):
-                        create_(create__),
-                        destroy_(destroy__),
-                        storage_(storage__),
-                        save_(save__),
-                        load_(load__),
-                        in_memory_(0)               {}
-
-      size_t        size() const                    { return elements_.size(); }
-      const CInt&   in_memory() const               { return in_memory_; }
-      inline void   clear();
-
-      int           add(Element e)                  { elements_.push_back(e); external_.push_back(-1); ++(*in_memory_.access()); return static_cast<int>(elements_.size()) - 1; }
-      void*         release(int i)                  { void* e = get(i); elements_[static_cast<size_t>(i)] = 0; return e; }
-
-      void*         find(int i) const               { return elements_[static_cast<size_t>(i)]; }                        // possibly returns 0, if the element is unloaded
-      void*         get(int i)                      { if (!find(i)) load(i); return find(i); }      // loads the element first, and then returns its address
-
-      int           available() const               { int i = 0; for (; i < (int)size(); ++i) if (find(i) != 0) break; return i; }
-
-      inline void   load(int i);
-      inline void   unload(int i);
-
-      Create        creator() const                 { return create_; }
-      Destroy       destroyer() const               { return destroy_; }
-      Load          loader() const                  { return load_; }
-      Save          saver() const                   { return save_; }
-
-      void*         create() const                  { return create_(); }
-      void          destroy(int i)                  { if (find(i)) { destroy_(find(i)); elements_[static_cast<size_t>(i)] = 0; } else if (external_[static_cast<size_t>(i)] != -1) storage_->destroy(external_[static_cast<size_t>(i)]); }
-
-      bool          own() const                     { return destroy_ != 0; }
-
-      ExternalStorage*      storage() const         { return storage_; }
-
-    private:
-      Create                create_;
-      Destroy               destroy_;
-      ExternalStorage*      storage_;
-      Save                  save_;
-      Load                  load_;
-
-      Elements              elements_;
-      std::vector<int>      external_;
-      CInt                  in_memory_;
-  };
-}
-
-void
-diy::Collection::
-clear()
-{
-  if (own())
-    for (size_t i = 0; i < size(); ++i)
-      destroy(static_cast<int>(i));
-  elements_.clear();
-  external_.clear();
-  *in_memory_.access() = 0;
-}
-
-void
-diy::Collection::
-unload(int i)
-{
-  //BinaryBuffer bb;
-  void* e = find(i);
-  //save_(e, bb);
-  //external_[i] = storage_->put(bb);
-  external_[static_cast<size_t>(i)] = storage_->put(e, save_);
-
-  destroy_(e);
-  elements_[static_cast<size_t>(i)] = 0;
-
-  --(*in_memory_.access());
-}
-
-void
-diy::Collection::
-load(int i)
-{
-  //BinaryBuffer bb;
-  //storage_->get(external_[i], bb);
-  void* e = create_();
-  //load_(e, bb);
-  storage_->get(external_[static_cast<size_t>(i)], e, load_);
-  elements_[static_cast<size_t>(i)] = e;
-  external_[static_cast<size_t>(i)] = -1;
-
-  ++(*in_memory_.access());
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/communicator.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/communicator.hpp
@ -1,13 +0,0 @@
-#ifndef VTKMDIY_COMMUNICATOR_HPP
-#define VTKMDIY_COMMUNICATOR_HPP
-
-#warning "diy::Communicator (in diy/communicator.hpp) is deprecated, use diy::mpi::communicator directly"
-
-#include "mpi.hpp"
-
-namespace diy
-{
-  typedef mpi::communicator         Communicator;
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/constants.h
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/constants.h
@ -1,26 +0,0 @@
-#ifndef VTKMDIY_CONSTANTS_H
-#define VTKMDIY_CONSTANTS_H
-
-// Default DIY_MAX_DIM to 4, unless provided by the user
-// (used for static min/max size in various Bounds)
-#ifndef DIY_MAX_DIM
-#define DIY_MAX_DIM 4
-#endif
-
-enum
-{
-  DIY_X0 = 0x01, /* minimum-side x (left) neighbor */
-  DIY_X1 = 0x02, /* maximum-side x (right) neighbor */
-  DIY_Y0 = 0x04, /* minimum-side y (bottom) neighbor */
-  DIY_Y1 = 0x08, /* maximum-side y (top) neighbor */
-  DIY_Z0 = 0x10, /* minimum-side z (back) neighbor */
-  DIY_Z1 = 0x20, /* maximum-side z (front)neighbor */
-  DIY_T0 = 0x40, /* minimum-side t (earlier) neighbor */
-  DIY_T1 = 0x80  /* maximum-side t (later) neighbor */
-};
-
-#ifndef DIY_UNUSED
-#define DIY_UNUSED(expr) do { (void)(expr); } while (0)
-#endif
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/critical-resource.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/critical-resource.hpp
@ -1,47 +0,0 @@
-#ifndef VTKMDIY_CRITICAL_RESOURCE_HPP
-#define VTKMDIY_CRITICAL_RESOURCE_HPP
-
-namespace diy
-{
-  template<class T, class Mutex>
-  class resource_accessor
-  {
-    public:
-                resource_accessor(T& x, Mutex& m):
-                    x_(x), lock_(m)                         {}
-                resource_accessor(resource_accessor&&)      = default;
-                resource_accessor(const resource_accessor&) = delete;
-
-      T&        operator*()                                 { return x_; }
-      T*        operator->()                                { return &x_; }
-      const T&  operator*() const                           { return x_; }
-      const T*  operator->() const                          { return &x_; }
-
-    private:
-      T&                        x_;
-      lock_guard<Mutex>         lock_;
-  };
-
-  template<class T, class Mutex = fast_mutex>
-  class critical_resource
-  {
-    public:
-      typedef           resource_accessor<T, Mutex>         accessor;
-      typedef           resource_accessor<const T, Mutex>   const_accessor;     // eventually, try shared locking
-
-    public:
-                        critical_resource()                 {}
-                        critical_resource(const T& x):
-                            x_(x)                           {}
-
-      accessor          access()                            { return accessor(x_, m_); }
-      const_accessor    const_access() const                { return const_accessor(x_, m_); }
-
-    private:
-      T                 x_;
-      mutable Mutex     m_;
-  };
-}
-
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/decomposition.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/decomposition.hpp
@ -1,716 +0,0 @@
-#ifndef VTKMDIY_DECOMPOSITION_HPP
-#define VTKMDIY_DECOMPOSITION_HPP
-
-#include <vector>
-#include <algorithm>
-#include <iostream>
-#include <cmath>
-#include <sstream>
-#include <stdexcept>
-
-#include "link.hpp"
-#include "assigner.hpp"
-#include "master.hpp"
-
-namespace diy
-{
-namespace detail
-{
-  template<class Bounds_, class Enable = void>
-  struct BoundsHelper;
-
-  // discrete bounds
-  template<class Bounds>
-  struct BoundsHelper<Bounds, typename std::enable_if<std::is_integral<typename Bounds::Coordinate>::value>::type>
-  {
-    using Coordinate = typename Bounds::Coordinate;
-
-    static Coordinate   from(int i, int n, Coordinate min, Coordinate max, bool)          { return min + (max - min + 1)/n * i; }
-    static Coordinate   to  (int i, int n, Coordinate min, Coordinate max, bool shared_face)
-    {
-      if (i == n - 1)
-        return max;
-      else
-        return from(i+1, n, min, max, shared_face) - (shared_face ? 0 : 1);
-    }
-
-    static int          lower(Coordinate x, int n, Coordinate min, Coordinate max, bool shared)
-    {
-        Coordinate width = (max - min + 1)/n;
-        Coordinate res = (x - min)/width;
-        if (res >= n) res = n - 1;
-
-        if (shared && x == from(res, n, min, max, shared))
-            --res;
-        return res;
-    }
-    static int          upper(Coordinate x, int n, Coordinate min, Coordinate max, bool shared)
-    {
-        Coordinate width = (max - min + 1)/n;
-        Coordinate res = (x - min)/width + 1;
-        if (shared && x == from(res, n, min, max, shared))
-            ++res;
-        return res;
-    }
-  };
-
-  // continuous bounds
-  template<class Bounds>
-  struct BoundsHelper<Bounds, typename std::enable_if<std::is_floating_point<typename Bounds::Coordinate>::value>::type>
-  {
-    using Coordinate = typename Bounds::Coordinate;
-
-    static Coordinate   from(int i, int n, Coordinate min, Coordinate max, bool)      { return min + (max - min)/n * i; }
-    static Coordinate   to  (int i, int n, Coordinate min, Coordinate max, bool)      { return min + (max - min)/n * (i+1); }
-
-    static int          lower(Coordinate x, int n, Coordinate min, Coordinate max, bool)   { Coordinate width = (max - min)/n; Coordinate res = std::floor((x - min)/width); if (min + res*width == x) return (res - 1); else return res; }
-    static int          upper(Coordinate x, int n, Coordinate min, Coordinate max, bool)   { Coordinate width = (max - min)/n; Coordinate res = std::ceil ((x - min)/width); if (min + res*width == x) return (res + 1); else return res; }
-  };
-}
-
-  //! \ingroup Decomposition
-  //! Decomposes a regular (discrete or continuous) domain into even blocks;
-  //! creates Links with Bounds along the way.
-  template<class Bounds_>
-  struct RegularDecomposer
-  {
-    typedef         Bounds_                                         Bounds;
-    typedef         typename BoundsValue<Bounds>::type              Coordinate;
-    typedef         typename RegularLinkSelector<Bounds>::type      Link;
-
-    using Creator = std::function<void(int,      Bounds, Bounds, Bounds, Link)>;
-    using Updater = std::function<void(int, int, Bounds, Bounds, Bounds, Link)>;
-
-    typedef         std::vector<bool>                               BoolVector;
-    typedef         std::vector<Coordinate>                         CoordinateVector;
-    typedef         std::vector<int>                                DivisionsVector;
-
-    /// @param dim:        dimensionality of the decomposition
-    /// @param domain:     bounds of global domain
-    /// @param nblocks:    total number of global blocks
-    /// @param share_face: indicates dimensions on which to share block faces
-    /// @param wrap:       indicates dimensions on which to wrap the boundary
-    /// @param ghosts:     indicates how many ghosts to use in each dimension
-    /// @param divisions:  indicates how many cuts to make along each dimension
-    ///                   (0 means "no constraint," i.e., leave it up to the algorithm)
-                    RegularDecomposer(int               dim_,
-                                      const Bounds&     domain_,
-                                      int               nblocks_,
-                                      BoolVector        share_face_ = BoolVector(),
-                                      BoolVector        wrap_       = BoolVector(),
-                                      CoordinateVector  ghosts_     = CoordinateVector(),
-                                      DivisionsVector   divisions_  = DivisionsVector()):
-                      dim(dim_), domain(domain_), nblocks(nblocks_),
-                      share_face(share_face_),
-                      wrap(wrap_), ghosts(ghosts_), divisions(divisions_)
-    {
-      if ((int) share_face.size() < dim)  share_face.resize(dim);
-      if ((int) wrap.size() < dim)        wrap.resize(dim);
-      if ((int) ghosts.size() < dim)      ghosts.resize(dim);
-      if ((int) divisions.size() < dim)   divisions.resize(dim);
-
-      fill_divisions(divisions);
-    }
-
-    // Calls create(int gid, const Bounds& bounds, const Link& link)
-    void            decompose(int rank, const StaticAssigner& assigner, const Creator& create);
-
-    void            decompose(int rank, const StaticAssigner& assigner, Master& master, const Updater& update);
-
-    void            decompose(int rank, const StaticAssigner& assigner, Master& master);
-
-    // find lowest gid that owns a particular point
-    template<class Point>
-    int             lowest_gid(const Point& p) const;
-
-    void            gid_to_coords(int gid, DivisionsVector& coords) const       { gid_to_coords(gid, coords, divisions); }
-    int             coords_to_gid(const DivisionsVector& coords) const          { return coords_to_gid(coords, divisions); }
-    void            fill_divisions(std::vector<int>& divisions) const;
-
-    void            fill_bounds(Bounds& bounds, const DivisionsVector& coords, bool add_ghosts = false) const;
-    void            fill_bounds(Bounds& bounds, int gid, bool add_ghosts = false) const;
-
-    static bool     all(const std::vector<int>& v, int x);
-    static void     gid_to_coords(int gid, DivisionsVector& coords, const DivisionsVector& divisions);
-    static int      coords_to_gid(const DivisionsVector& coords, const DivisionsVector& divisions);
-
-    static void     factor(std::vector<unsigned>& factors, int n);
-
-    // Point to GIDs functions
-    template<class Point>
-    void            point_to_gids(std::vector<int>& gids, const Point& p) const;
-
-    //! returns gid of a block that contains the point; ignores ghosts
-    template<class Point>
-    int             point_to_gid(const Point& p) const;
-
-    template<class Point>
-    int             num_gids(const Point& p) const;
-
-    template<class Point>
-    void            top_bottom(int& top, int& bottom, const Point& p, int axis) const;
-
-
-    int               dim;
-    Bounds            domain;
-    int               nblocks;
-    BoolVector        share_face;
-    BoolVector        wrap;
-    CoordinateVector  ghosts;
-    DivisionsVector   divisions;
-
-  };
-
-  /**
-   * \ingroup Decomposition
-   * \brief Decomposes the domain into a prescribed pattern of blocks.
-   *
-   * @param dim        dimension of the domain
-   * @param rank       local rank
-   * @param assigner   decides how processors are assigned to blocks (maps a gid to a rank)
-   *                   also communicates the total number of blocks
-   * @param create     the callback functor
-   * @param wrap       indicates dimensions on which to wrap the boundary
-   * @param ghosts     indicates how many ghosts to use in each dimension
-   * @param divs       indicates how many cuts to make along each dimension
-   *                   (0 means "no constraint," i.e., leave it up to the algorithm)
-   *
-   * `create(...)` is called with each block assigned to the local domain. See [decomposition example](#decomposition-example).
-   */
-  template<class Bounds>
-  void decompose(int                    dim,
-                 int                    rank,
-                 const Bounds&          domain,
-                 const StaticAssigner&  assigner,
-                 const typename RegularDecomposer<Bounds>::Creator&   create,
-                 typename RegularDecomposer<Bounds>::BoolVector       share_face = typename RegularDecomposer<Bounds>::BoolVector(),
-                 typename RegularDecomposer<Bounds>::BoolVector       wrap       = typename RegularDecomposer<Bounds>::BoolVector(),
-                 typename RegularDecomposer<Bounds>::CoordinateVector ghosts     = typename RegularDecomposer<Bounds>::CoordinateVector(),
-                 typename RegularDecomposer<Bounds>::DivisionsVector  divs       = typename RegularDecomposer<Bounds>::DivisionsVector())
-  {
-    RegularDecomposer<Bounds>(dim, domain, assigner.nblocks(), share_face, wrap, ghosts, divs).decompose(rank, assigner, create);
-  }
-
-  /**
-   * \ingroup Decomposition
-   * \brief Decomposes the domain into a prescribed pattern of blocks.
-   *
-   * @param dim        dimension of the domain
-   * @param rank       local rank
-   * @param assigner   decides how processors are assigned to blocks (maps a gid to a rank)
-   *                   also communicates the total number of blocks
-   * @param master     gets the blocks once this function returns
-   * @param wrap       indicates dimensions on which to wrap the boundary
-   * @param ghosts     indicates how many ghosts to use in each dimension
-   * @param divs       indicates how many cuts to make along each dimension
-   *                   (0 means "no constraint," i.e., leave it up to the algorithm)
-   *
-   * `master` must have been supplied a create function in order for this function to work.
-   */
-  template<class Bounds>
-  void decompose(int                    dim,
-                 int                    rank,
-                 const Bounds&          domain,
-                 const StaticAssigner&  assigner,
-                 Master&                master,
-                 typename RegularDecomposer<Bounds>::BoolVector       share_face = typename RegularDecomposer<Bounds>::BoolVector(),
-                 typename RegularDecomposer<Bounds>::BoolVector       wrap       = typename RegularDecomposer<Bounds>::BoolVector(),
-                 typename RegularDecomposer<Bounds>::CoordinateVector ghosts     = typename RegularDecomposer<Bounds>::CoordinateVector(),
-                 typename RegularDecomposer<Bounds>::DivisionsVector  divs       = typename RegularDecomposer<Bounds>::DivisionsVector())
-  {
-    RegularDecomposer<Bounds>(dim, domain, assigner.nblocks(), share_face, wrap, ghosts, divs).decompose(rank, assigner, master);
-  }
-
-  /**
-   * \ingroup Decomposition
-   * \brief A "null" decompositon that simply creates the blocks and adds them to the master
-   *
-   * @param rank       local rank
-   * @param assigner   decides how processors are assigned to blocks (maps a gid to a rank)
-   *                   also communicates the total number of blocks
-   * @param master     gets the blocks once this function returns
-   */
-  inline
-  void decompose(int                    rank,
-                 const StaticAssigner&  assigner,
-                 Master&                master)
-  {
-    std::vector<int>  local_gids;
-    assigner.local_gids(rank, local_gids);
-
-    for (size_t i = 0; i < local_gids.size(); ++i)
-      master.add(local_gids[i], master.create(), new diy::Link);
-  }
-
-    /**
-     * \ingroup Decomposition
-     * \brief Add a decomposition (modify links) of an existing set of blocks that were
-     * added to the master previously
-     *
-     * @param rank       local rank
-     * @param assigner   decides how processors are assigned to blocks (maps a gid to a rank)
-     *                   also communicates the total number of blocks
-     */
-  template<class Bounds>
-  void decompose(int                    dim,
-                 int                    rank,
-                 const Bounds&          domain,
-                 const StaticAssigner&  assigner,
-                 Master&                master,
-                 const typename RegularDecomposer<Bounds>::Updater&   update,
-                 typename RegularDecomposer<Bounds>::BoolVector       share_face =
-                 typename RegularDecomposer<Bounds>::BoolVector(),
-                 typename RegularDecomposer<Bounds>::BoolVector       wrap       =
-                 typename RegularDecomposer<Bounds>::BoolVector(),
-                 typename RegularDecomposer<Bounds>::CoordinateVector ghosts     =
-                 typename RegularDecomposer<Bounds>::CoordinateVector(),
-                 typename RegularDecomposer<Bounds>::DivisionsVector  divs       =
-                 typename RegularDecomposer<Bounds>::DivisionsVector())
-  {
-      RegularDecomposer<Bounds>(dim, domain, assigner.nblocks(), share_face, wrap, ghosts, divs).
-          decompose(rank, assigner, master, update);
-  }
-
-  //! Decomposition example: \example decomposition/test-decomposition.cpp
-  //! Direct master insertion example: \example decomposition/test-direct-master.cpp
-}
-
-// decomposes domain and adds blocks to the master
-template<class Bounds>
-void
-diy::RegularDecomposer<Bounds>::
-decompose(int rank, const StaticAssigner& assigner, Master& master)
-{
-  decompose(rank, assigner, [&master](int gid, const Bounds&, const Bounds&, const Bounds&, const Link& link)
-  {
-    void*     b = master.create();
-    Link*     l = new Link(link);
-    master.add(gid, b, l);
-  });
-}
-
-template<class Bounds>
-void
-diy::RegularDecomposer<Bounds>::
-decompose(int rank, const StaticAssigner& assigner, const Creator& create)
-{
-  std::vector<int> gids;
-  assigner.local_gids(rank, gids);
-  for (int i = 0; i < (int)gids.size(); ++i)
-  {
-    int gid = gids[i];
-
-    DivisionsVector coords;
-    gid_to_coords(gid, coords);
-
-    Bounds core, bounds;
-    fill_bounds(core,   coords);
-    fill_bounds(bounds, coords, true);
-
-    // Fill link with all the neighbors
-    Link link(dim, core, bounds);
-    std::vector<int>  offsets(dim, -1);
-    offsets[0] = -2;
-    while (!all(offsets, 1))
-    {
-      // next offset
-      int j;
-      for (j = 0; j < dim; ++j)
-        if (offsets[j] == 1)
-          offsets[j] = -1;
-        else
-          break;
-      ++offsets[j];
-
-      if (all(offsets, 0)) continue;      // skip ourselves
-
-      DivisionsVector     nhbr_coords(dim);
-      Direction           dir, wrap_dir;
-      bool                inbounds = true;
-      for (int k = 0; k < dim; ++k)
-      {
-        nhbr_coords[k] = coords[k] + offsets[k];
-
-        // wrap
-        if (nhbr_coords[k] < 0)
-        {
-          if (wrap[k])
-          {
-            nhbr_coords[k] = divisions[k] - 1;
-            wrap_dir[k] = -1;
-          }
-          else
-            inbounds = false;
-        }
-
-        if (nhbr_coords[k] >= divisions[k])
-        {
-          if (wrap[k])
-          {
-            nhbr_coords[k] = 0;
-            wrap_dir[k] = 1;
-          }
-          else
-            inbounds = false;
-        }
-
-        // NB: this needs to match the addressing scheme in dir_t (in constants.h)
-        if (offsets[k] == -1 || offsets[k] == 1)
-          dir[k] = offsets[k];
-      }
-      if (!inbounds) continue;
-
-      int nhbr_gid = coords_to_gid(nhbr_coords);
-      BlockID bid; bid.gid = nhbr_gid; bid.proc = assigner.rank(nhbr_gid);
-      link.add_neighbor(bid);
-
-      Bounds nhbr_bounds;
-      fill_bounds(nhbr_bounds, nhbr_coords);
-      link.add_bounds(nhbr_bounds);
-
-      link.add_direction(dir);
-      link.add_wrap(wrap_dir);
-    }
-
-    create(gid, core, bounds, domain, link);
-  }
-}
-
-// decomposes domain but does not add blocks to master, assumes they were added already
-template<class Bounds>
-void
-diy::RegularDecomposer<Bounds>::
-decompose(int rank, const StaticAssigner& assigner, Master& master, const Updater& update)
-{
-    decompose(rank, assigner, [&master,&update](int gid, const Bounds& core, const Bounds& bounds, const Bounds& domain_, const Link& link)
-    {
-        int lid = master.lid(gid);
-        Link* l = new Link(link);
-        master.replace_link(lid, l);
-        update(gid, lid, core, bounds, domain_, *l);
-    });
-}
-
-template<class Bounds>
-bool
-diy::RegularDecomposer<Bounds>::
-all(const std::vector<int>& v, int x)
-{
-  for (unsigned i = 0; i < v.size(); ++i)
-    if (v[i] != x)
-      return false;
-  return true;
-}
-
-template<class Bounds>
-void
-diy::RegularDecomposer<Bounds>::
-gid_to_coords(int gid, DivisionsVector& coords, const DivisionsVector& divisions)
-{
-  int dim = static_cast<int>(divisions.size());
-  for (int i = 0; i < dim; ++i)
-  {
-    coords.push_back(gid % divisions[i]);
-    gid /= divisions[i];
-  }
-}
-
-template<class Bounds>
-int
-diy::RegularDecomposer<Bounds>::
-coords_to_gid(const DivisionsVector& coords, const DivisionsVector& divisions)
-{
-  int gid = 0;
-  for (int i = static_cast<int>(coords.size()) - 1; i >= 0; --i)
-  {
-    gid *= divisions[i];
-    gid += coords[i];
-  }
-  return gid;
-}
-
-//! \ingroup Decomposition
-//! Gets the bounds, with or without ghosts, for a block specified by its block coordinates
-template<class Bounds>
-void
-diy::RegularDecomposer<Bounds>::
-fill_bounds(Bounds& bounds,                  //!< (output) bounds
-            const DivisionsVector& coords,   //!< coordinates of the block in the decomposition
-            bool add_ghosts)                 //!< whether to include ghosts in the output bounds
-    const
-{
-  for (int i = 0; i < dim; ++i)
-  {
-    bounds.min[i] = detail::BoundsHelper<Bounds>::from(coords[i], divisions[i], domain.min[i], domain.max[i], share_face[i]);
-    bounds.max[i] = detail::BoundsHelper<Bounds>::to  (coords[i], divisions[i], domain.min[i], domain.max[i], share_face[i]);
-  }
-
-  for (int i = dim; i < DIY_MAX_DIM; ++i)   // set the unused dimension to 0
-  {
-    bounds.min[i] = 0;
-    bounds.max[i] = 0;
-  }
-
-  if (!add_ghosts)
-    return;
-
-  for (int i = 0; i < dim; ++i)
-  {
-    if (wrap[i])
-    {
-      bounds.min[i] -= ghosts[i];
-      bounds.max[i] += ghosts[i];
-    } else
-    {
-      bounds.min[i] = std::max(domain.min[i], bounds.min[i] - ghosts[i]);
-      bounds.max[i] = std::min(domain.max[i], bounds.max[i] + ghosts[i]);
-    }
-  }
-}
-
-//! \ingroup Decomposition
-//! Gets the bounds, with or without ghosts, for a block specified by its gid
-template<class Bounds>
-void
-diy::RegularDecomposer<Bounds>::
-fill_bounds(Bounds& bounds,                  //!< (output) bounds
-            int gid,                         //!< global id of the block
-            bool add_ghosts)                 //!< whether to include ghosts in the output bounds
-    const
-{
-    DivisionsVector coords;
-    gid_to_coords(gid, coords);
-    if (add_ghosts)
-        fill_bounds(bounds, coords, true);
-    else
-        fill_bounds(bounds, coords);
-}
-
-namespace diy { namespace detail {
-// current state of division in one dimension used in fill_divisions below
-template<class Coordinate>
-struct Div
-{
-    int dim;                                 // 0, 1, 2, etc. e.g. for x, y, z etc.
-    int nb;                                  // number of blocks so far in this dimension
-    Coordinate b_size;                       // block size so far in this dimension
-
-    // sort on descending block size unless tied, in which case
-    // sort on ascending num blocks in current dim unless tied, in which case
-    // sort on ascending dimension
-    bool operator<(Div rhs) const
-    {
-        // sort on second value of the pair unless tied, in which case sort on first
-        if (b_size == rhs.b_size)
-        {
-            if (nb == rhs.nb)
-                return(dim < rhs.dim);
-            return(nb < rhs.nb);
-        }
-        return(b_size > rhs.b_size);
-    }
-};
-} }
-
-template<class Bounds>
-void
-diy::RegularDecomposer<Bounds>::
-fill_divisions(std::vector<int>& divisions_) const
-{
-    // prod = number of blocks unconstrained by user; c = number of unconstrained dimensions
-    int prod = 1; int c = 0;
-    for (int i = 0; i < dim; ++i)
-        if (divisions_[i] != 0)
-        {
-            prod *= divisions_[i];
-            ++c;
-        }
-
-    if (nblocks % prod != 0)
-        throw std::runtime_error("Total number of blocks cannot be factored into provided divs");
-
-    if (c == (int) divisions_.size())               // nothing to do; user provided all divs
-        return;
-
-    // factor number of blocks left in unconstrained dimensions
-    // factorization is sorted from smallest to largest factors
-    std::vector<unsigned> factors;
-    factor(factors, nblocks/prod);
-
-    using detail::Div;
-    std::vector< Div<Coordinate> > missing_divs;              // pairs consisting of (dim, #divs)
-
-    // init missing_divs
-    for (int i = 0; i < dim; i++)
-    {
-        if (divisions_[i] == 0)
-        {
-            Div<Coordinate> div;
-            div.dim = i;
-            div.nb = 1;
-            div.b_size = domain.max[i] - domain.min[i];
-            missing_divs.push_back(div);
-        }
-    }
-
-    // iterate over factorization of number of blocks (factors are sorted smallest to largest)
-    // NB: using int instead of size_t because must be negative in order to break out of loop
-    for (int i = factors.size() - 1; i >= 0; --i)
-    {
-        // fill in missing divs by dividing dimension w/ largest block size
-        // except when this would be illegal (resulting in bounds.max < bounds.min;
-        // only a problem for discrete bounds
-
-        // sort on decreasing block size
-        std::sort(missing_divs.begin(), missing_divs.end());
-
-        // split the dimension with the largest block size (first element in vector)
-        Coordinate min =
-            detail::BoundsHelper<Bounds>::from(0,
-                                               missing_divs[0].nb * factors[i],
-                                               domain.min[missing_divs[0].dim],
-                                               domain.max[missing_divs[0].dim],
-                                               share_face[missing_divs[0].dim]);
-        Coordinate max =
-            detail::BoundsHelper<Bounds>::to(0,
-                                             missing_divs[0].nb * factors[i],
-                                             domain.min[missing_divs[0].dim],
-                                             domain.max[missing_divs[0].dim],
-                                             share_face[missing_divs[0].dim]);
-        if (max >= min)
-        {
-            missing_divs[0].nb    *= factors[i];
-            missing_divs[0].b_size = max - min;
-        }
-        else
-        {
-            std::ostringstream oss;
-            oss << "Unable to decompose domain into " << nblocks << " blocks: " << min << " " << max;
-            throw std::runtime_error(oss.str());
-        }
-    }
-
-    // assign the divisions
-    for (size_t i = 0; i < missing_divs.size(); i++)
-        divisions_[missing_divs[i].dim] = missing_divs[i].nb;
-}
-
-template<class Bounds>
-void
-diy::RegularDecomposer<Bounds>::
-factor(std::vector<unsigned>& factors, int n)
-{
-  while (n != 1)
-    for (int i = 2; i <= n; ++i)
-    {
-      if (n % i == 0)
-      {
-        factors.push_back(i);
-        n /= i;
-        break;
-      }
-    }
-}
-
-// Point to GIDs
-// TODO: deal with wrap correctly
-// TODO: add an optional ghosts argument to ignore ghosts (if we want to find the true owners, or something like that)
-template<class Bounds>
-template<class Point>
-void
-diy::RegularDecomposer<Bounds>::
-point_to_gids(std::vector<int>& gids, const Point& p) const
-{
-    std::vector< std::pair<int, int> > ranges(dim);
-    for (int i = 0; i < dim; ++i)
-        top_bottom(ranges[i].second, ranges[i].first, p, i);
-
-    // look up gids for all combinations
-    DivisionsVector coords(dim), location(dim);
-    while(location.back() < ranges.back().second - ranges.back().first)
-    {
-        for (int i = 0; i < dim; ++i)
-            coords[i] = ranges[i].first + location[i];
-        gids.push_back(coords_to_gid(coords, divisions));
-
-        location[0]++;
-        unsigned i = 0;
-        while (i < dim-1 && location[i] == ranges[i].second - ranges[i].first)
-        {
-            location[i] = 0;
-            ++i;
-            location[i]++;
-        }
-    }
-}
-
-template<class Bounds>
-template<class Point>
-int
-diy::RegularDecomposer<Bounds>::
-point_to_gid(const Point& p) const
-{
-    int gid = 0;
-    for (int axis = dim - 1; axis >= 0; --axis)
-    {
-      int bottom  = detail::BoundsHelper<Bounds>::lower(p[axis], divisions[axis], domain.min[axis], domain.max[axis], share_face[axis]);
-          bottom  = std::max(0, bottom);
-
-      // coupled with coords_to_gid
-      gid *= divisions[axis];
-      gid += bottom;
-    }
-
-    return gid;
-}
-
-template<class Bounds>
-template<class Point>
-int
-diy::RegularDecomposer<Bounds>::
-num_gids(const Point& p) const
-{
-    int res = 1;
-    for (int i = 0; i < dim; ++i)
-    {
-        int top, bottom;
-        top_bottom(top, bottom, p, i);
-        res *= top - bottom;
-    }
-    return res;
-}
-
-template<class Bounds>
-template<class Point>
-void
-diy::RegularDecomposer<Bounds>::
-top_bottom(int& top, int& bottom, const Point& p, int axis) const
-{
-    Coordinate l = p[axis] - ghosts[axis];
-    Coordinate r = p[axis] + ghosts[axis];
-
-    top     = detail::BoundsHelper<Bounds>::upper(r, divisions[axis], domain.min[axis], domain.max[axis], share_face[axis]);
-    bottom  = detail::BoundsHelper<Bounds>::lower(l, divisions[axis], domain.min[axis], domain.max[axis], share_face[axis]);
-
-    if (!wrap[axis])
-    {
-        bottom  = std::max(0, bottom);
-        top     = std::min(divisions[axis], top);
-    }
-}
-
-// find lowest gid that owns a particular point
-template<class Bounds>
-template<class Point>
-int
-diy::RegularDecomposer<Bounds>::
-lowest_gid(const Point& p) const
-{
-    // TODO: optimize - no need to compute all gids
-    std::vector<int> gids;
-    point_to_gids(gids, p);
-    std::sort(gids.begin(), gids.end());
-    return gids[0];
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/algorithms/kdtree-sampling.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/algorithms/kdtree-sampling.hpp
@ -1,450 +0,0 @@
-#ifndef VTKMDIY_DETAIL_ALGORITHMS_KDTREE_SAMPLING_HPP
-#define VTKMDIY_DETAIL_ALGORITHMS_KDTREE_SAMPLING_HPP
-
-#include <vector>
-#include <cassert>
-#include "../../partners/all-reduce.hpp"
-#include "../../log.hpp"
-
-// TODO: technically, what's done now is not a perfect subsample:
-//       we take the same number of samples from every block, in reality this number should be selected at random,
-//       so that the total number of samples adds up to samples*nblocks
-//
-// NB: random samples are chosen using rand(), which is assumed to be seeded
-//     externally. Once we switch to C++11, we should use its more advanced
-//     random number generators (and take a generator as an external parameter)
-//     (TODO)
-
-namespace diy
-{
-namespace detail
-{
-
-template<class Block, class Point>
-struct KDTreeSamplingPartition
-{
-    typedef     diy::RegularContinuousLink      RCLink;
-    typedef     diy::ContinuousBounds           Bounds;
-
-    typedef     std::vector<float>              Samples;
-
-                KDTreeSamplingPartition(int                             dim,
-                                        std::vector<Point>  Block::*    points,
-                                        size_t                          samples):
-                    dim_(dim), points_(points), samples_(samples)           {}
-
-    void        operator()(Block* b, const diy::ReduceProxy& srp, const KDTreePartners& partners) const;
-
-    int         divide_gid(int gid, bool lower, int round, int rounds) const;
-    void        update_links(Block* b, const diy::ReduceProxy& srp, int dim, int round, int rounds, bool wrap, const Bounds& domain) const;
-    void        split_to_neighbors(Block* b, const diy::ReduceProxy& srp, int dim) const;
-    diy::Direction
-                find_wrap(const Bounds& bounds, const Bounds& nbr_bounds, const Bounds& domain) const;
-
-    void        compute_local_samples(Block* b, const diy::ReduceProxy& srp, int dim) const;
-    void        add_samples(Block* b, const diy::ReduceProxy& srp, Samples& samples) const;
-    void        receive_samples(Block* b, const diy::ReduceProxy& srp,       Samples& samples) const;
-    void        forward_samples(Block* b, const diy::ReduceProxy& srp, const Samples& samples) const;
-
-    void        enqueue_exchange(Block* b, const diy::ReduceProxy& srp, int dim, const Samples& samples) const;
-    void        dequeue_exchange(Block* b, const diy::ReduceProxy& srp, int dim) const;
-
-    void        update_neighbor_bounds(Bounds& bounds, float split, int dim, bool lower) const;
-    bool        intersects(const Bounds& x, const Bounds& y, int dim, bool wrap, const Bounds& domain) const;
-    float       find_split(const Bounds& changed, const Bounds& original) const;
-
-    int                             dim_;
-    std::vector<Point>  Block::*    points_;
-    size_t                          samples_;
-};
-
-}
-}
-
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-operator()(Block* b, const diy::ReduceProxy& srp, const KDTreePartners& partners) const
-{
-    int dim;
-    if (srp.round() < partners.rounds())
-        dim = partners.dim(srp.round());
-    else
-        dim = partners.dim(srp.round() - 1);
-
-    if (srp.round() == partners.rounds())
-        update_links(b, srp, dim, partners.sub_round(srp.round() - 2), partners.swap_rounds(), partners.wrap, partners.domain); // -1 would be the "uninformative" link round
-    else if (partners.swap_round(srp.round()) && partners.sub_round(srp.round()) < 0)       // link round
-    {
-        dequeue_exchange(b, srp, dim);         // from the swap round
-        split_to_neighbors(b, srp, dim);
-    }
-    else if (partners.swap_round(srp.round()))
-    {
-        Samples samples;
-        receive_samples(b, srp, samples);
-        enqueue_exchange(b, srp, dim, samples);
-    } else if (partners.sub_round(srp.round()) == 0)
-    {
-        if (srp.round() > 0)
-        {
-            int prev_dim = dim - 1;
-            if (prev_dim < 0)
-                prev_dim += dim_;
-            update_links(b, srp, prev_dim, partners.sub_round(srp.round() - 2), partners.swap_rounds(), partners.wrap, partners.domain);    // -1 would be the "uninformative" link round
-        }
-
-        compute_local_samples(b, srp, dim);
-    } else if (partners.sub_round(srp.round()) < (int) partners.histogram.rounds()/2)     // we are reusing partners class, so really we are talking about the samples rounds here
-    {
-        Samples samples;
-        add_samples(b, srp, samples);
-        srp.enqueue(srp.out_link().target(0), samples);
-    } else
-    {
-        Samples samples;
-        add_samples(b, srp, samples);
-        if (samples.size() != 1)
-        {
-            // pick the median
-            std::nth_element(samples.begin(), samples.begin() + samples.size()/2, samples.end());
-            std::swap(samples[0], samples[samples.size()/2]);
-            //std::sort(samples.begin(), samples.end());
-            //samples[0] = (samples[samples.size()/2] + samples[samples.size()/2 + 1])/2;
-            samples.resize(1);
-        }
-        forward_samples(b, srp, samples);
-    }
-}
-
-template<class Block, class Point>
-int
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-divide_gid(int gid, bool lower, int round, int rounds) const
-{
-    if (lower)
-        gid &= ~(1 << (rounds - 1 - round));
-    else
-        gid |=  (1 << (rounds - 1 - round));
-    return gid;
-}
-
-// round here is the outer iteration of the algorithm
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-update_links(Block* b, const diy::ReduceProxy& srp, int dim, int round, int rounds, bool wrap, const Bounds& domain) const
-{
-    auto        log  = get_logger();
-    int         gid  = srp.gid();
-    int         lid  = srp.master()->lid(gid);
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    // (gid, dir) -> i
-    std::map<std::pair<int,diy::Direction>, int> link_map;
-    for (int i = 0; i < link->size(); ++i)
-        link_map[std::make_pair(link->target(i).gid, link->direction(i))] = i;
-
-    // NB: srp.enqueue(..., ...) should match the link
-    std::vector<float>  splits(link->size());
-    for (int i = 0; i < link->size(); ++i)
-    {
-        float split; diy::Direction dir;
-
-        int in_gid = link->target(i).gid;
-        while(srp.incoming(in_gid))
-        {
-            srp.dequeue(in_gid, split);
-            srp.dequeue(in_gid, dir);
-
-            // reverse dir
-            for (int j = 0; j < dim_; ++j)
-                dir[j] = -dir[j];
-
-            int k = link_map[std::make_pair(in_gid, dir)];
-            log->trace("{} {} {} -> {}", in_gid, dir, split, k);
-            splits[k] = split;
-        }
-    }
-
-    RCLink      new_link(dim_, link->core(), link->core());
-
-    bool lower = !(gid & (1 << (rounds - 1 - round)));
-
-    // fill out the new link
-    for (int i = 0; i < link->size(); ++i)
-    {
-        diy::Direction  dir = link->direction(i);
-        //diy::Direction  wrap_dir = link->wrap(i);     // we don't use existing wrap, but restore it from scratch
-        if (dir[dim] != 0)
-        {
-            if ((dir[dim] < 0 && lower) || (dir[dim] > 0 && !lower))
-            {
-                int nbr_gid = divide_gid(link->target(i).gid, !lower, round, rounds);
-                diy::BlockID nbr = { nbr_gid, srp.assigner().rank(nbr_gid) };
-                new_link.add_neighbor(nbr);
-
-                new_link.add_direction(dir);
-
-                Bounds bounds = link->bounds(i);
-                update_neighbor_bounds(bounds, splits[i], dim, !lower);
-                new_link.add_bounds(bounds);
-
-                if (wrap)
-                    new_link.add_wrap(find_wrap(new_link.bounds(), bounds, domain));
-                else
-                    new_link.add_wrap(diy::Direction());
-            }
-        } else // non-aligned side
-        {
-            for (int j = 0; j < 2; ++j)
-            {
-                int nbr_gid = divide_gid(link->target(i).gid, j == 0, round, rounds);
-
-                Bounds  bounds  = link->bounds(i);
-                update_neighbor_bounds(bounds, splits[i], dim, j == 0);
-
-                if (intersects(bounds, new_link.bounds(), dim, wrap, domain))
-                {
-                    diy::BlockID nbr = { nbr_gid, srp.assigner().rank(nbr_gid) };
-                    new_link.add_neighbor(nbr);
-                    new_link.add_direction(dir);
-                    new_link.add_bounds(bounds);
-
-                    if (wrap)
-                        new_link.add_wrap(find_wrap(new_link.bounds(), bounds, domain));
-                    else
-                        new_link.add_wrap(diy::Direction());
-                }
-            }
-        }
-    }
-
-    // add link to the dual block
-    int dual_gid = divide_gid(gid, !lower, round, rounds);
-    diy::BlockID dual = { dual_gid, srp.assigner().rank(dual_gid) };
-    new_link.add_neighbor(dual);
-
-    Bounds nbr_bounds = link->bounds();     // old block bounds
-    update_neighbor_bounds(nbr_bounds, find_split(new_link.bounds(), nbr_bounds), dim, !lower);
-    new_link.add_bounds(nbr_bounds);
-
-    new_link.add_wrap(diy::Direction());    // dual block cannot be wrapped
-
-    if (lower)
-    {
-        diy::Direction right;
-        right[dim] = 1;
-        new_link.add_direction(right);
-    } else
-    {
-        diy::Direction left;
-        left[dim] = -1;
-        new_link.add_direction(left);
-    }
-
-    // update the link; notice that this won't conflict with anything since
-    // reduce is using its own notion of the link constructed through the
-    // partners
-    link->swap(new_link);
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-split_to_neighbors(Block* b, const diy::ReduceProxy& srp, int dim) const
-{
-    int         lid  = srp.master()->lid(srp.gid());
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    // determine split
-    float split = find_split(link->core(), link->bounds());
-
-    for (int i = 0; i < link->size(); ++i)
-    {
-        srp.enqueue(link->target(i), split);
-        srp.enqueue(link->target(i), link->direction(i));
-    }
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-compute_local_samples(Block* b, const diy::ReduceProxy& srp, int dim) const
-{
-    // compute and enqueue local samples
-    Samples samples;
-    size_t points_size = (b->*points_).size();
-    size_t n = std::min(points_size, samples_);
-    samples.reserve(n);
-    for (size_t i = 0; i < n; ++i)
-    {
-        float x = (b->*points_)[rand() % points_size][dim];
-        samples.push_back(x);
-    }
-
-    srp.enqueue(srp.out_link().target(0), samples);
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-add_samples(Block* b, const diy::ReduceProxy& srp, Samples& samples) const
-{
-    // dequeue and combine the samples
-    for (int i = 0; i < srp.in_link().size(); ++i)
-    {
-        int nbr_gid = srp.in_link().target(i).gid;
-
-        Samples smpls;
-        srp.dequeue(nbr_gid, smpls);
-        for (size_t j = 0; j < smpls.size(); ++j)
-            samples.push_back(smpls[j]);
-    }
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-receive_samples(Block* b, const diy::ReduceProxy& srp, Samples& samples) const
-{
-    srp.dequeue(srp.in_link().target(0).gid, samples);
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-forward_samples(Block* b, const diy::ReduceProxy& srp, const Samples& samples) const
-{
-    for (int i = 0; i < srp.out_link().size(); ++i)
-        srp.enqueue(srp.out_link().target(i), samples);
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-enqueue_exchange(Block* b, const diy::ReduceProxy& srp, int dim, const Samples& samples) const
-{
-    int         lid  = srp.master()->lid(srp.gid());
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    int k = srp.out_link().size();
-
-    if (k == 0)        // final round; nothing needs to be sent; this is actually redundant
-        return;
-
-    // pick split points
-    float split = samples[0];
-
-    // subset and enqueue
-    std::vector< std::vector<Point> > out_points(srp.out_link().size());
-    for (size_t i = 0; i < (b->*points_).size(); ++i)
-    {
-      float x = (b->*points_)[i][dim];
-      int loc = x < split ? 0 : 1;
-      out_points[loc].push_back((b->*points_)[i]);
-    }
-    int pos = -1;
-    for (int i = 0; i < k; ++i)
-    {
-      if (srp.out_link().target(i).gid == srp.gid())
-      {
-        (b->*points_).swap(out_points[i]);
-        pos = i;
-      }
-      else
-        srp.enqueue(srp.out_link().target(i), out_points[i]);
-    }
-    if (pos == 0)
-        link->core().max[dim] = split;
-    else
-        link->core().min[dim] = split;
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-dequeue_exchange(Block* b, const diy::ReduceProxy& srp, int dim) const
-{
-    int         lid  = srp.master()->lid(srp.gid());
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    for (int i = 0; i < srp.in_link().size(); ++i)
-    {
-      int nbr_gid = srp.in_link().target(i).gid;
-      if (nbr_gid == srp.gid())
-          continue;
-
-      std::vector<Point>   in_points;
-      srp.dequeue(nbr_gid, in_points);
-      for (size_t j = 0; j < in_points.size(); ++j)
-      {
-        if (in_points[j][dim] < link->core().min[dim] || in_points[j][dim] > link->core().max[dim])
-            throw std::runtime_error(fmt::format("Dequeued {} outside [{},{}] ({})",
-                                                 in_points[j][dim], link->core().min[dim], link->core().max[dim], dim));
-        (b->*points_).push_back(in_points[j]);
-      }
-    }
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-update_neighbor_bounds(Bounds& bounds, float split, int dim, bool lower) const
-{
-    if (lower)
-        bounds.max[dim] = split;
-    else
-        bounds.min[dim] = split;
-}
-
-template<class Block, class Point>
-bool
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-intersects(const Bounds& x, const Bounds& y, int dim, bool wrap, const Bounds& domain) const
-{
-    if (wrap)
-    {
-        if (x.min[dim] == domain.min[dim] && y.max[dim] == domain.max[dim])
-            return true;
-        if (y.min[dim] == domain.min[dim] && x.max[dim] == domain.max[dim])
-            return true;
-    }
-    return x.min[dim] <= y.max[dim] && y.min[dim] <= x.max[dim];
-}
-
-template<class Block, class Point>
-float
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-find_split(const Bounds& changed, const Bounds& original) const
-{
-    for (int i = 0; i < dim_; ++i)
-    {
-        if (changed.min[i] != original.min[i])
-            return changed.min[i];
-        if (changed.max[i] != original.max[i])
-            return changed.max[i];
-    }
-    assert(0);
-    return -1;
-}
-
-template<class Block, class Point>
-diy::Direction
-diy::detail::KDTreeSamplingPartition<Block,Point>::
-find_wrap(const Bounds& bounds, const Bounds& nbr_bounds, const Bounds& domain) const
-{
-    diy::Direction wrap;
-    for (int i = 0; i < dim_; ++i)
-    {
-        if (bounds.min[i] == domain.min[i] && nbr_bounds.max[i] == domain.max[i])
-            wrap[i] = -1;
-        if (bounds.max[i] == domain.max[i] && nbr_bounds.min[i] == domain.min[i])
-            wrap[i] =  1;
-    }
-    return wrap;
-}
-
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/algorithms/kdtree.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/algorithms/kdtree.hpp
@ -1,569 +0,0 @@
-#ifndef VTKMDIY_DETAIL_ALGORITHMS_KDTREE_HPP
-#define VTKMDIY_DETAIL_ALGORITHMS_KDTREE_HPP
-
-#include <vector>
-#include <cassert>
-#include "../../partners/all-reduce.hpp"
-#include "../../log.hpp"
-
-namespace diy
-{
-namespace detail
-{
-
-struct KDTreePartners;
-
-template<class Block, class Point>
-struct KDTreePartition
-{
-    typedef     diy::RegularContinuousLink      RCLink;
-    typedef     diy::ContinuousBounds           Bounds;
-
-    typedef     std::vector<size_t>             Histogram;
-
-                KDTreePartition(int                             dim,
-                                std::vector<Point>  Block::*    points,
-                                size_t                          bins):
-                    dim_(dim), points_(points), bins_(bins)            {}
-
-    void        operator()(Block* b, const diy::ReduceProxy& srp, const KDTreePartners& partners) const;
-
-    int         divide_gid(int gid, bool lower, int round, int rounds) const;
-    void        update_links(Block* b, const diy::ReduceProxy& srp, int dim, int round, int rounds, bool wrap, const Bounds& domain) const;
-    void        split_to_neighbors(Block* b, const diy::ReduceProxy& srp, int dim) const;
-    diy::Direction
-                find_wrap(const Bounds& bounds, const Bounds& nbr_bounds, const Bounds& domain) const;
-
-    void        compute_local_histogram(Block* b, const diy::ReduceProxy& srp, int dim) const;
-    void        add_histogram(Block* b, const diy::ReduceProxy& srp, Histogram& histogram) const;
-    void        receive_histogram(Block* b, const diy::ReduceProxy& srp,       Histogram& histogram) const;
-    void        forward_histogram(Block* b, const diy::ReduceProxy& srp, const Histogram& histogram) const;
-
-    void        enqueue_exchange(Block* b, const diy::ReduceProxy& srp, int dim, const Histogram& histogram) const;
-    void        dequeue_exchange(Block* b, const diy::ReduceProxy& srp, int dim) const;
-
-    void        update_neighbor_bounds(Bounds& bounds, float split, int dim, bool lower) const;
-    bool        intersects(const Bounds& x, const Bounds& y, int dim, bool wrap, const Bounds& domain) const;
-    float       find_split(const Bounds& changed, const Bounds& original) const;
-
-    int                             dim_;
-    std::vector<Point>  Block::*    points_;
-    size_t                          bins_;
-};
-
-}
-}
-
-struct diy::detail::KDTreePartners
-{
-  // bool = are we in a swap (vs histogram) round
-  // int  = round within that partner
-  typedef           std::pair<bool, int>                    RoundType;
-  typedef           diy::ContinuousBounds                   Bounds;
-
-                    KDTreePartners(int dim, int nblocks, bool wrap_, const Bounds& domain_):
-                        decomposer(1, interval(0,nblocks-1), nblocks),
-                        histogram(decomposer, 2),
-                        swap(decomposer, 2, false),
-                        wrap(wrap_),
-                        domain(domain_)
-  {
-    for (unsigned i = 0; i < swap.rounds(); ++i)
-    {
-      // fill histogram rounds
-      for (unsigned j = 0; j < histogram.rounds(); ++j)
-      {
-        rounds_.push_back(std::make_pair(false, j));
-        dim_.push_back(i % dim);
-        if (j == histogram.rounds() / 2 - 1 - i)
-            j += 2*i;
-      }
-
-      // fill swap round
-      rounds_.push_back(std::make_pair(true, i));
-      dim_.push_back(i % dim);
-
-      // fill link round
-      rounds_.push_back(std::make_pair(true, -1));          // (true, -1) signals link round
-      dim_.push_back(i % dim);
-    }
-  }
-
-  size_t        rounds() const                              { return rounds_.size(); }
-  size_t        swap_rounds() const                         { return swap.rounds(); }
-
-  int           dim(int round) const                        { return dim_[round]; }
-  bool          swap_round(int round) const                 { return rounds_[round].first; }
-  int           sub_round(int round) const                  { return rounds_[round].second; }
-
-  inline bool   active(int round, int gid, const diy::Master& m) const
-  {
-    if (round == (int) rounds())
-        return true;
-    else if (swap_round(round) && sub_round(round) < 0)     // link round
-        return true;
-    else if (swap_round(round))
-        return swap.active(sub_round(round), gid, m);
-    else
-        return histogram.active(sub_round(round), gid, m);
-  }
-
-  inline void   incoming(int round, int gid, std::vector<int>& partners, const diy::Master& m) const
-  {
-    if (round == (int) rounds())
-        link_neighbors(-1, gid, partners, m);
-    else if (swap_round(round) && sub_round(round) < 0)       // link round
-        swap.incoming(sub_round(round - 1) + 1, gid, partners, m);
-    else if (swap_round(round))
-        histogram.incoming(histogram.rounds(), gid, partners, m);
-    else
-    {
-        if (round > 0 && sub_round(round) == 0)
-            link_neighbors(-1, gid, partners, m);
-        else if (round > 0 && sub_round(round - 1) != sub_round(round) - 1)        // jump through the histogram rounds
-            histogram.incoming(sub_round(round - 1) + 1, gid, partners, m);
-        else
-            histogram.incoming(sub_round(round), gid, partners, m);
-    }
-  }
-
-  inline void   outgoing(int round, int gid, std::vector<int>& partners, const diy::Master& m) const
-  {
-    if (round == (int) rounds())
-        swap.outgoing(sub_round(round-1) + 1, gid, partners, m);
-    else if (swap_round(round) && sub_round(round) < 0)       // link round
-        link_neighbors(-1, gid, partners, m);
-    else if (swap_round(round))
-        swap.outgoing(sub_round(round), gid, partners, m);
-    else
-        histogram.outgoing(sub_round(round), gid, partners, m);
-  }
-
-  inline void   link_neighbors(int, int gid, std::vector<int>& partners, const diy::Master& m) const
-  {
-    int         lid  = m.lid(gid);
-    diy::Link*  link = m.link(lid);
-
-    std::set<int> result;       // partners must be unique
-    for (int i = 0; i < link->size(); ++i)
-        result.insert(link->target(i).gid);
-
-    for (std::set<int>::const_iterator it = result.begin(); it != result.end(); ++it)
-        partners.push_back(*it);
-  }
-
-  // 1-D domain to feed into histogram and swap
-  diy::RegularDecomposer<diy::DiscreteBounds>   decomposer;
-
-  diy::RegularAllReducePartners     histogram;
-  diy::RegularSwapPartners          swap;
-
-  std::vector<RoundType>            rounds_;
-  std::vector<int>                  dim_;
-
-  bool                              wrap;
-  Bounds                            domain;
-};
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-operator()(Block* b, const diy::ReduceProxy& srp, const KDTreePartners& partners) const
-{
-    int dim;
-    if (srp.round() < partners.rounds())
-        dim = partners.dim(srp.round());
-    else
-        dim = partners.dim(srp.round() - 1);
-
-    if (srp.round() == partners.rounds())
-        update_links(b, srp, dim, partners.sub_round(srp.round() - 2), partners.swap_rounds(), partners.wrap, partners.domain); // -1 would be the "uninformative" link round
-    else if (partners.swap_round(srp.round()) && partners.sub_round(srp.round()) < 0)       // link round
-    {
-        dequeue_exchange(b, srp, dim);         // from the swap round
-        split_to_neighbors(b, srp, dim);
-    }
-    else if (partners.swap_round(srp.round()))
-    {
-        Histogram   histogram;
-        receive_histogram(b, srp, histogram);
-        enqueue_exchange(b, srp, dim, histogram);
-    } else if (partners.sub_round(srp.round()) == 0)
-    {
-        if (srp.round() > 0)
-        {
-            int prev_dim = dim - 1;
-            if (prev_dim < 0)
-                prev_dim += dim_;
-            update_links(b, srp, prev_dim, partners.sub_round(srp.round() - 2), partners.swap_rounds(), partners.wrap, partners.domain);    // -1 would be the "uninformative" link round
-        }
-
-        compute_local_histogram(b, srp, dim);
-    } else if (partners.sub_round(srp.round()) < (int) partners.histogram.rounds()/2)
-    {
-        Histogram   histogram(bins_);
-        add_histogram(b, srp, histogram);
-        srp.enqueue(srp.out_link().target(0), histogram);
-    }
-    else
-    {
-        Histogram   histogram(bins_);
-        add_histogram(b, srp, histogram);
-        forward_histogram(b, srp, histogram);
-    }
-}
-
-template<class Block, class Point>
-int
-diy::detail::KDTreePartition<Block,Point>::
-divide_gid(int gid, bool lower, int round, int rounds) const
-{
-    if (lower)
-        gid &= ~(1 << (rounds - 1 - round));
-    else
-        gid |=  (1 << (rounds - 1 - round));
-    return gid;
-}
-
-// round here is the outer iteration of the algorithm
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-update_links(Block* b, const diy::ReduceProxy& srp, int dim, int round, int rounds, bool wrap, const Bounds& domain) const
-{
-    int         gid  = srp.gid();
-    int         lid  = srp.master()->lid(gid);
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    // (gid, dir) -> i
-    std::map<std::pair<int,diy::Direction>, int> link_map;
-    for (int i = 0; i < link->size(); ++i)
-        link_map[std::make_pair(link->target(i).gid, link->direction(i))] = i;
-
-    // NB: srp.enqueue(..., ...) should match the link
-    std::vector<float>  splits(link->size());
-    for (int i = 0; i < link->size(); ++i)
-    {
-        float split; diy::Direction dir;
-
-        int in_gid = link->target(i).gid;
-        while(srp.incoming(in_gid))
-        {
-            srp.dequeue(in_gid, split);
-            srp.dequeue(in_gid, dir);
-
-            // reverse dir
-            for (int j = 0; j < dim_; ++j)
-                dir[j] = -dir[j];
-
-            int k = link_map[std::make_pair(in_gid, dir)];
-            splits[k] = split;
-        }
-    }
-
-    RCLink      new_link(dim_, link->core(), link->core());
-
-    bool lower = !(gid & (1 << (rounds - 1 - round)));
-
-    // fill out the new link
-    for (int i = 0; i < link->size(); ++i)
-    {
-        diy::Direction  dir      = link->direction(i);
-        //diy::Direction  wrap_dir = link->wrap(i);     // we don't use existing wrap, but restore it from scratch
-        if (dir[dim] != 0)
-        {
-            if ((dir[dim] < 0 && lower) || (dir[dim] > 0 && !lower))
-            {
-                int nbr_gid = divide_gid(link->target(i).gid, !lower, round, rounds);
-                diy::BlockID nbr = { nbr_gid, srp.assigner().rank(nbr_gid) };
-                new_link.add_neighbor(nbr);
-
-                new_link.add_direction(dir);
-
-                Bounds bounds = link->bounds(i);
-                update_neighbor_bounds(bounds, splits[i], dim, !lower);
-                new_link.add_bounds(bounds);
-
-                if (wrap)
-                    new_link.add_wrap(find_wrap(new_link.bounds(), bounds, domain));
-                else
-                    new_link.add_wrap(diy::Direction());
-            }
-        } else // non-aligned side
-        {
-            for (int j = 0; j < 2; ++j)
-            {
-                int nbr_gid = divide_gid(link->target(i).gid, j == 0, round, rounds);
-
-                Bounds  bounds  = link->bounds(i);
-                update_neighbor_bounds(bounds, splits[i], dim, j == 0);
-
-                if (intersects(bounds, new_link.bounds(), dim, wrap, domain))
-                {
-                    diy::BlockID nbr = { nbr_gid, srp.assigner().rank(nbr_gid) };
-                    new_link.add_neighbor(nbr);
-                    new_link.add_direction(dir);
-                    new_link.add_bounds(bounds);
-
-                    if (wrap)
-                        new_link.add_wrap(find_wrap(new_link.bounds(), bounds, domain));
-                    else
-                        new_link.add_wrap(diy::Direction());
-                }
-            }
-        }
-    }
-
-    // add link to the dual block
-    int dual_gid = divide_gid(gid, !lower, round, rounds);
-    diy::BlockID dual = { dual_gid, srp.assigner().rank(dual_gid) };
-    new_link.add_neighbor(dual);
-
-    Bounds nbr_bounds = link->bounds();     // old block bounds
-    update_neighbor_bounds(nbr_bounds, find_split(new_link.bounds(), nbr_bounds), dim, !lower);
-    new_link.add_bounds(nbr_bounds);
-
-    new_link.add_wrap(diy::Direction());    // dual block cannot be wrapped
-
-    if (lower)
-    {
-        diy::Direction right;
-        right[dim] = 1;
-        new_link.add_direction(right);
-    } else
-    {
-        diy::Direction left;
-        left[dim] = -1;
-        new_link.add_direction(left);
-    }
-
-    // update the link; notice that this won't conflict with anything since
-    // reduce is using its own notion of the link constructed through the
-    // partners
-    link->swap(new_link);
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-split_to_neighbors(Block* b, const diy::ReduceProxy& srp, int dim) const
-{
-    int         lid  = srp.master()->lid(srp.gid());
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    // determine split
-    float split = find_split(link->core(), link->bounds());
-
-    for (int i = 0; i < link->size(); ++i)
-    {
-        srp.enqueue(link->target(i), split);
-        srp.enqueue(link->target(i), link->direction(i));
-    }
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-compute_local_histogram(Block* b, const diy::ReduceProxy& srp, int dim) const
-{
-    int         lid  = srp.master()->lid(srp.gid());
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    // compute and enqueue local histogram
-    Histogram histogram(bins_);
-
-    float   width = (link->core().max[dim] - link->core().min[dim])/bins_;
-    for (size_t i = 0; i < (b->*points_).size(); ++i)
-    {
-        float x = (b->*points_)[i][dim];
-        int loc = (x - link->core().min[dim]) / width;
-        if (loc < 0)
-            throw std::runtime_error(fmt::format("{} {} {}", loc, x, link->core().min[dim]));
-        if (loc >= (int) bins_)
-            loc = bins_ - 1;
-        ++(histogram[loc]);
-    }
-
-    srp.enqueue(srp.out_link().target(0), histogram);
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-add_histogram(Block* b, const diy::ReduceProxy& srp, Histogram& histogram) const
-{
-    // dequeue and add up the histograms
-    for (int i = 0; i < srp.in_link().size(); ++i)
-    {
-        int nbr_gid = srp.in_link().target(i).gid;
-
-        Histogram hist;
-        srp.dequeue(nbr_gid, hist);
-        for (size_t j = 0; j < hist.size(); ++j)
-            histogram[j] += hist[j];
-    }
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-receive_histogram(Block* b, const diy::ReduceProxy& srp, Histogram& histogram) const
-{
-    srp.dequeue(srp.in_link().target(0).gid, histogram);
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-forward_histogram(Block* b, const diy::ReduceProxy& srp, const Histogram& histogram) const
-{
-    for (int i = 0; i < srp.out_link().size(); ++i)
-        srp.enqueue(srp.out_link().target(i), histogram);
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-enqueue_exchange(Block* b, const diy::ReduceProxy& srp, int dim, const Histogram& histogram) const
-{
-    auto        log = get_logger();
-
-    int         lid  = srp.master()->lid(srp.gid());
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    int k = srp.out_link().size();
-
-    if (k == 0)        // final round; nothing needs to be sent; this is actually redundant
-        return;
-
-    // pick split points
-    size_t total = 0;
-    for (size_t i = 0; i < histogram.size(); ++i)
-        total += histogram[i];
-    log->trace("Histogram total: {}", total);
-
-    size_t cur   = 0;
-    float  width = (link->core().max[dim] - link->core().min[dim])/bins_;
-    float  split = 0;
-    for (size_t i = 0; i < histogram.size(); ++i)
-    {
-        if (cur + histogram[i] > total/2)
-        {
-            split = link->core().min[dim] + width*i;
-            break;
-        }
-        cur += histogram[i];
-    }
-    log->trace("Found split: {} (dim={}) in {} - {}", split, dim, link->core().min[dim], link->core().max[dim]);
-
-    // subset and enqueue
-    std::vector< std::vector<Point> > out_points(srp.out_link().size());
-    for (size_t i = 0; i < (b->*points_).size(); ++i)
-    {
-      float x = (b->*points_)[i][dim];
-      int loc = x < split ? 0 : 1;
-      out_points[loc].push_back((b->*points_)[i]);
-    }
-    int pos = -1;
-    for (int i = 0; i < k; ++i)
-    {
-      if (srp.out_link().target(i).gid == srp.gid())
-      {
-        (b->*points_).swap(out_points[i]);
-        pos = i;
-      }
-      else
-        srp.enqueue(srp.out_link().target(i), out_points[i]);
-    }
-    if (pos == 0)
-        link->core().max[dim] = split;
-    else
-        link->core().min[dim] = split;
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-dequeue_exchange(Block* b, const diy::ReduceProxy& srp, int dim) const
-{
-    int         lid  = srp.master()->lid(srp.gid());
-    RCLink*     link = static_cast<RCLink*>(srp.master()->link(lid));
-
-    for (int i = 0; i < srp.in_link().size(); ++i)
-    {
-      int nbr_gid = srp.in_link().target(i).gid;
-      if (nbr_gid == srp.gid())
-          continue;
-
-      std::vector<Point>   in_points;
-      srp.dequeue(nbr_gid, in_points);
-      for (size_t j = 0; j < in_points.size(); ++j)
-      {
-        if (in_points[j][dim] < link->core().min[dim] || in_points[j][dim] > link->core().max[dim])
-            throw std::runtime_error(fmt::format("Dequeued {} outside [{},{}] ({})",
-                                     in_points[j][dim], link->core().min[dim], link->core().max[dim], dim));
-        (b->*points_).push_back(in_points[j]);
-      }
-    }
-}
-
-template<class Block, class Point>
-void
-diy::detail::KDTreePartition<Block,Point>::
-update_neighbor_bounds(Bounds& bounds, float split, int dim, bool lower) const
-{
-    if (lower)
-        bounds.max[dim] = split;
-    else
-        bounds.min[dim] = split;
-}
-
-template<class Block, class Point>
-bool
-diy::detail::KDTreePartition<Block,Point>::
-intersects(const Bounds& x, const Bounds& y, int dim, bool wrap, const Bounds& domain) const
-{
-    if (wrap)
-    {
-        if (x.min[dim] == domain.min[dim] && y.max[dim] == domain.max[dim])
-            return true;
-        if (y.min[dim] == domain.min[dim] && x.max[dim] == domain.max[dim])
-            return true;
-    }
-    return x.min[dim] <= y.max[dim] && y.min[dim] <= x.max[dim];
-}
-
-template<class Block, class Point>
-float
-diy::detail::KDTreePartition<Block,Point>::
-find_split(const Bounds& changed, const Bounds& original) const
-{
-    for (int i = 0; i < dim_; ++i)
-    {
-        if (changed.min[i] != original.min[i])
-            return changed.min[i];
-        if (changed.max[i] != original.max[i])
-            return changed.max[i];
-    }
-    assert(0);
-    return -1;
-}
-
-template<class Block, class Point>
-diy::Direction
-diy::detail::KDTreePartition<Block,Point>::
-find_wrap(const Bounds& bounds, const Bounds& nbr_bounds, const Bounds& domain) const
-{
-    diy::Direction wrap;
-    for (int i = 0; i < dim_; ++i)
-    {
-        if (bounds.min[i] == domain.min[i] && nbr_bounds.max[i] == domain.max[i])
-            wrap[i] = -1;
-        if (bounds.max[i] == domain.max[i] && nbr_bounds.min[i] == domain.min[i])
-            wrap[i] =  1;
-    }
-    return wrap;
-}
-
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/algorithms/sort.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/algorithms/sort.hpp
@ -1,162 +0,0 @@
-#ifndef VTKMDIY_DETAIL_ALGORITHMS_SORT_HPP
-#define VTKMDIY_DETAIL_ALGORITHMS_SORT_HPP
-
-#include <functional>
-#include <algorithm>
-
-namespace diy
-{
-
-namespace detail
-{
-
-template<class Block, class T, class Cmp>
-struct SampleSort
-{
-    typedef         std::vector<T>      Block::*ValuesVector;
-    struct Sampler;
-    struct Exchanger;
-
-                    SampleSort(ValuesVector values_, ValuesVector samples_, const Cmp& cmp_, size_t num_samples_):
-                        values(values_), samples(samples_),
-                        cmp(cmp_), num_samples(num_samples_)                    {}
-
-    Sampler         sample() const                                              { return Sampler(values, samples, cmp, num_samples); }
-    Exchanger       exchange() const                                            { return Exchanger(values, samples, cmp); }
-
-    static void     dequeue_values(std::vector<T>& v, const ReduceProxy& rp, bool skip_self = true)
-    {
-        auto log = get_logger();
-
-        int k_in  = rp.in_link().size();
-
-        log->trace("dequeue_values(): gid={}, round={}; v.size()={}", rp.gid(), rp.round(), v.size());
-
-        if (detail::is_default< Serialization<T> >::value)
-        {
-            // add up sizes
-            size_t sz = 0;
-            size_t end = v.size();
-            for (int i = 0; i < k_in; ++i)
-            {
-                log->trace("    incoming size from {}: {}", rp.in_link().target(i).gid, sz);
-                if (skip_self && rp.in_link().target(i).gid == rp.gid()) continue;
-                MemoryBuffer& in = rp.incoming(rp.in_link().target(i).gid);
-                sz += in.size() / sizeof(T);
-            }
-            log->trace("    incoming size: {}", sz);
-            v.resize(end + sz);
-
-            for (int i = 0; i < k_in; ++i)
-            {
-                if (skip_self && rp.in_link().target(i).gid == rp.gid()) continue;
-                MemoryBuffer& in = rp.incoming(rp.in_link().target(i).gid);
-                size_t incoming_sz = in.size() / sizeof(T);
-                T* bg = (T*) &in.buffer[0];
-                std::copy(bg, bg + incoming_sz, &v[end]);
-                end += incoming_sz;
-            }
-        } else
-        {
-            for (int i = 0; i < k_in; ++i)
-            {
-                if (skip_self && rp.in_link().target(i).gid == rp.gid()) continue;
-                MemoryBuffer& in = rp.incoming(rp.in_link().target(i).gid);
-                while(in)
-                {
-                    T x;
-                    diy::load(in, x);
-                    v.emplace_back(std::move(x));
-                }
-            }
-        }
-        log->trace("    v.size()={}", v.size());
-    }
-
-    ValuesVector    values;
-    ValuesVector    samples;
-    Cmp             cmp;
-    size_t          num_samples;
-};
-
-template<class Block, class T, class Cmp>
-struct SampleSort<Block,T,Cmp>::Sampler
-{
-                    Sampler(ValuesVector values_, ValuesVector dividers_, const Cmp& cmp_, size_t num_samples_):
-                        values(values_), dividers(dividers_), cmp(cmp_), num_samples(num_samples_)    {}
-
-    void            operator()(Block* b, const ReduceProxy& srp, const RegularSwapPartners& partners) const
-    {
-        int k_in  = srp.in_link().size();
-        int k_out = srp.out_link().size();
-
-        std::vector<T> samples;
-
-        if (k_in == 0)
-        {
-            // draw random samples
-            for (size_t i = 0; i < num_samples; ++i)
-                samples.push_back((b->*values)[std::rand() % (b->*values).size()]);
-        } else
-            dequeue_values(samples, srp, false);
-
-        if (k_out == 0)
-        {
-            // pick subsamples that separate quantiles
-            std::sort(samples.begin(), samples.end(), cmp);
-            std::vector<T>  subsamples(srp.nblocks() - 1);
-            int step = samples.size() / srp.nblocks();       // NB: subsamples.size() + 1
-            for (size_t i = 0; i < subsamples.size(); ++i)
-                subsamples[i] = samples[(i+1)*step];
-            (b->*dividers).swap(subsamples);
-        }
-        else
-        {
-            for (int i = 0; i < k_out; ++i)
-            {
-                MemoryBuffer& out = srp.outgoing(srp.out_link().target(i));
-                save(out, &samples[0], samples.size());
-            }
-        }
-    }
-
-    ValuesVector    values;
-    ValuesVector    dividers;
-    Cmp             cmp;
-    size_t          num_samples;
-};
-
-template<class Block, class T, class Cmp>
-struct SampleSort<Block,T,Cmp>::Exchanger
-{
-                    Exchanger(ValuesVector values_, ValuesVector samples_, const Cmp& cmp_):
-                        values(values_), samples(samples_), cmp(cmp_)       {}
-
-    void            operator()(Block* b, const ReduceProxy& rp) const
-    {
-        if (rp.round() == 0)
-        {
-            // enqueue values to the correct locations
-            for (size_t i = 0; i < (b->*values).size(); ++i)
-            {
-                int to = std::lower_bound((b->*samples).begin(), (b->*samples).end(), (b->*values)[i], cmp) - (b->*samples).begin();
-                rp.enqueue(rp.out_link().target(to), (b->*values)[i]);
-            }
-            (b->*values).clear();
-        } else
-        {
-            dequeue_values((b->*values), rp, false);
-            std::sort((b->*values).begin(), (b->*values).end(), cmp);
-        }
-    }
-
-    ValuesVector    values;
-    ValuesVector    samples;
-    Cmp             cmp;
-};
-
-}
-
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/block_traits.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/block_traits.hpp
@ -1,31 +0,0 @@
-#ifndef VTKMDIY_BLOCK_TRAITS_HPP
-#define VTKMDIY_BLOCK_TRAITS_HPP
-
-#include "traits.hpp"
-
-namespace diy
-{
-namespace detail
-{
-    template<class F>
-    struct block_traits
-    {
-        typedef typename std::remove_pointer<typename function_traits<F>::template arg<0>::type>::type type;
-    };
-
-    // matches block member functions
-    template<class Block, class R, class... Args>
-    struct block_traits<R(Block::*)(Args...)>
-    {
-        typedef Block type;
-    };
-
-    template<class Block, class R, class... Args>
-    struct block_traits<R(Block::*)(Args...) const>
-    {
-        typedef Block type;
-    };
-}
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/master/collectives.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/master/collectives.hpp
@ -1,131 +0,0 @@
-namespace diy
-{
-    namespace detail
-    {
-        struct CollectiveOp
-        {
-          virtual void    init()                                  =0;
-          virtual void    update(const CollectiveOp& other)       =0;
-          virtual void    global(const mpi::communicator& comm)   =0;
-          virtual void    copy_from(const CollectiveOp& other)    =0;
-          virtual void    result_out(void* dest) const            =0;
-          virtual         ~CollectiveOp()                         {}
-        };
-
-        template<class T, class Op>
-        struct AllReduceOp: public CollectiveOp
-        {
-                AllReduceOp(const T& x, Op op):
-                  in_(x), op_(op)         {}
-
-          void  init()                                    { out_ = in_; }
-          void  update(const CollectiveOp& other)         { out_ = op_(out_, static_cast<const AllReduceOp&>(other).in_); }
-          void  global(const mpi::communicator& comm)     { T res; mpi::all_reduce(comm, out_, res, op_); out_ = res; }
-          void  copy_from(const CollectiveOp& other)      { out_ = static_cast<const AllReduceOp&>(other).out_; }
-          void  result_out(void* dest) const              { *reinterpret_cast<T*>(dest) = out_; }
-
-          private:
-            T     in_, out_;
-            Op    op_;
-        };
-
-        template<class T>
-        struct Scratch: public CollectiveOp
-        {
-                Scratch(const T& x):
-                  x_(x)                                   {}
-
-          void  init()                                    {}
-          void  update(const CollectiveOp&)               {}
-          void  global(const mpi::communicator&)          {}
-          void  copy_from(const CollectiveOp&)            {}
-          void  result_out(void* dest) const              { *reinterpret_cast<T*>(dest) = x_; }
-
-          private:
-            T     x_;
-        };
-    }
-
-    struct Master::Collective
-    {
-                    Collective():
-                      cop_(0)                           {}
-                    Collective(detail::CollectiveOp* cop):
-                      cop_(cop)                         {}
-                    Collective(Collective&& other):
-                      cop_(0)                           { swap(const_cast<Collective&>(other)); }
-                    ~Collective()                       { delete cop_; }
-
-        Collective& operator=(const Collective& other)  = delete;
-                    Collective(Collective& other)       = delete;
-
-        void        init()                              { cop_->init(); }
-        void        swap(Collective& other)             { std::swap(cop_, other.cop_); }
-        void        update(const Collective& other)     { cop_->update(*other.cop_); }
-        void        global(const mpi::communicator& c)  { cop_->global(c); }
-        void        copy_from(Collective& other) const  { cop_->copy_from(*other.cop_); }
-        void        result_out(void* x) const           { cop_->result_out(x); }
-
-        detail::CollectiveOp*                           cop_;
-    };
-
-    struct Master::CollectivesList: public std::list<Collective>
-    {};
-
-    struct Master::CollectivesMap: public std::map<int, CollectivesList>
-    {};
-}
-
-diy::Master::CollectivesMap&
-diy::Master::
-collectives()
-{
-    return *collectives_;
-}
-
-diy::Master::CollectivesList&
-diy::Master::
-collectives(int gid__)
-{
-    return (*collectives_)[gid__];
-}
-
-void
-diy::Master::
-process_collectives()
-{
-  auto scoped = prof.scoped("collectives");
-  DIY_UNUSED(scoped);
-
-  if (collectives().empty())
-      return;
-
-  using CollectivesIterator = CollectivesList::iterator;
-  std::vector<CollectivesIterator>  iters;
-  std::vector<int>                  gids;
-  for (auto& x : collectives())
-  {
-    gids.push_back(x.first);
-    iters.push_back(x.second.begin());
-  }
-
-  while (iters[0] != collectives().begin()->second.end())
-  {
-    iters[0]->init();
-    for (unsigned j = 1; j < iters.size(); ++j)
-    {
-      // NB: this assumes that the operations are commutative
-      iters[0]->update(*iters[j]);
-    }
-    iters[0]->global(comm_);        // do the mpi collective
-
-    for (unsigned j = 1; j < iters.size(); ++j)
-    {
-      iters[j]->copy_from(*iters[0]);
-      ++iters[j];
-    }
-
-    ++iters[0];
-  }
-}
-
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/master/commands.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/master/commands.hpp
@ -1,22 +0,0 @@
-namespace diy
-{
-    struct Master::BaseCommand
-    {
-        virtual       ~BaseCommand()                                                  {}      // to delete derived classes
-        virtual void  execute(void* b, const ProxyWithLink& cp) const                 =0;
-        virtual bool  skip(int i, const Master& master) const                         =0;
-    };
-
-    template<class Block>
-    struct Master::Command: public BaseCommand
-    {
-              Command(Callback<Block> f_, const Skip& s_):
-                  f(f_), s(s_)                                                        {}
-
-        void  execute(void* b, const ProxyWithLink& cp) const override                { f(static_cast<Block*>(b), cp); }
-        bool  skip(int i, const Master& m) const override                             { return s(i,m); }
-
-        Callback<Block>   f;
-        Skip              s;
-    };
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/master/communication.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/master/communication.hpp
@ -1,202 +0,0 @@
-namespace diy
-{
-    struct Master::tags             { enum { queue, piece }; };
-
-    struct Master::MessageInfo
-    {
-        int from, to;
-        int round;
-    };
-
-    struct Master::InFlightSend
-    {
-        std::shared_ptr<MemoryBuffer> message;
-        mpi::request                  request;
-
-        MessageInfo info;           // for debug purposes
-    };
-
-    struct Master::InFlightRecv
-    {
-        MemoryBuffer    message;
-        MessageInfo     info { -1, -1, -1 };
-        bool            done = false;
-
-        inline void     recv(mpi::communicator& comm, const mpi::status& status);
-        inline void     place(IncomingRound* in, bool unload, ExternalStorage* storage, IExchangeInfo* iexchange);
-        void            reset()     { *this = InFlightRecv(); }
-    };
-
-    struct Master::InFlightRecvsMap: public std::map<int, InFlightRecv>
-    {};
-
-    struct Master::InFlightSendsList: public std::list<InFlightSend>
-    {};
-
-    struct Master::GidSendOrder
-    {
-        size_t              size() const                        { return list.size(); }
-        bool                empty() const                       { return list.empty(); }
-        int                 pop()                               { int x = list.front(); list.pop_front(); return x; }
-
-        std::list<int>      list;
-        size_t              limit = 0;
-    };
-
-    struct Master::IExchangeInfo
-    {
-                        IExchangeInfo():
-                            n(0)                                                  {}
-                        IExchangeInfo(size_t n_, mpi::communicator comm_):
-                            n(n_),
-                            comm(comm_),
-                            global_work_(new mpi::window<int>(comm, 1))           { global_work_->lock_all(MPI_MODE_NOCHECK); }
-                        ~IExchangeInfo()                                          { global_work_->unlock_all(); }
-
-      inline void       not_done(int gid);
-
-      inline int        global_work();                          // get global work status (for debugging)
-      inline bool       all_done();                             // get global all done status
-      inline void       reset_work();                           // reset global work counter
-      inline int        add_work(int work);                     // add work to global work counter
-      int               inc_work()                              { return add_work(1); }   // increment global work counter
-      int               dec_work()                              { return add_work(-1); }  // decremnent global work counter
-
-      size_t                              n;
-      mpi::communicator                   comm;
-      std::unordered_map<int, bool>       done;                 // gid -> done
-      std::unique_ptr<mpi::window<int>>   global_work_;         // global work to do
-      std::shared_ptr<spd::logger>        log = get_logger();
-    };
-
-    // VectorWindow is used to send and receive subsets of a contiguous array in-place
-    namespace detail
-    {
-        template <typename T>
-        struct VectorWindow
-        {
-            T *begin;
-            size_t count;
-        };
-    } // namespace detail
-
-    namespace mpi
-    {
-    namespace detail
-    {
-        template<typename T>  struct is_mpi_datatype< diy::detail::VectorWindow<T> > { typedef true_type type; };
-
-        template <typename T>
-        struct mpi_datatype< diy::detail::VectorWindow<T> >
-        {
-            using VecWin = diy::detail::VectorWindow<T>;
-            static MPI_Datatype         datatype()                { return get_mpi_datatype<T>(); }
-            static const void*          address(const VecWin& x)  { return x.begin; }
-            static void*                address(VecWin& x)        { return x.begin; }
-            static int                  count(const VecWin& x)    { return static_cast<int>(x.count); }
-        };
-    }
-    } // namespace mpi::detail
-} // namespace diy
-
-void
-diy::Master::IExchangeInfo::
-not_done(int gid)
-{
-    if (done[gid])
-    {
-        done[gid] = false;
-        int work = inc_work();
-        log->debug("[{}] Incrementing work when switching done (on receipt): work = {}\n", gid, work);
-    } else
-        log->debug("[{}] Not done, no need to increment work\n", gid);
-}
-
-diy::Master::InFlightRecv&
-diy::Master::
-inflight_recv(int proc)
-{
-    return (*inflight_recvs_)[proc];
-}
-
-diy::Master::InFlightSendsList&
-diy::Master::inflight_sends()
-{
-    return *inflight_sends_;
-}
-
-// receive message described by status
-void
-diy::Master::InFlightRecv::
-recv(mpi::communicator& comm, const mpi::status& status)
-{
-    if (info.from == -1)            // uninitialized
-    {
-        MemoryBuffer bb;
-        comm.recv(status.source(), status.tag(), bb.buffer);
-
-        if (status.tag() == tags::piece)     // first piece is the header
-        {
-            size_t msg_size;
-            diy::load(bb, msg_size);
-            diy::load(bb, info);
-
-            message.buffer.reserve(msg_size);
-        }
-        else    // tags::queue
-        {
-            diy::load_back(bb, info);
-            message.swap(bb);
-        }
-    }
-    else
-    {
-        size_t start_idx = message.buffer.size();
-        size_t count = status.count<char>();
-        message.buffer.resize(start_idx + count);
-
-        detail::VectorWindow<char> window;
-        window.begin = &message.buffer[start_idx];
-        window.count = count;
-
-        comm.recv(status.source(), status.tag(), window);
-    }
-
-    if (status.tag() == tags::queue)
-        done = true;
-}
-
-// once the InFlightRecv is done, place it either out of core or in the appropriate incoming queue
-void
-diy::Master::InFlightRecv::
-place(IncomingRound* in, bool unload, ExternalStorage* storage, IExchangeInfo* iexchange)
-{
-    size_t size     = message.size();
-    int from        = info.from;
-    int to          = info.to;
-    int external    = -1;
-
-    if (unload)
-    {
-        get_logger()->debug("Directly unloading queue {} <- {}", to, from);
-        external = storage->put(message);       // unload directly
-    }
-    else if (!iexchange)
-    {
-        in->map[to].queues[from].swap(message);
-        in->map[to].queues[from].reset();       // buffer position = 0
-    }
-    else    // iexchange
-    {
-        auto log = get_logger();
-
-        iexchange->not_done(to);
-        in->map[to].queues[from].append_binary(&message.buffer[0], message.size());        // append insted of overwrite
-
-        int work = iexchange->dec_work();
-        log->debug("[{}] Decrementing work after receiving: work = {}\n", to, work);
-    }
-    in->map[to].records[from] = QueueRecord(size, external);
-
-    ++(in->received);
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/master/execution.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/master/execution.hpp
@ -1,157 +0,0 @@
-struct diy::Master::ProcessBlock
-{
-          ProcessBlock(Master&                    master_,
-                       const std::deque<int>&     blocks__,
-                       int                        local_limit_,
-                       critical_resource<int>&    idx_):
-              master(master_),
-              blocks(blocks__),
-              local_limit(local_limit_),
-              idx(idx_)
-          {}
-
-          ProcessBlock(const ProcessBlock&)     = delete;
-          ProcessBlock(ProcessBlock&&)          = default;
-
-  void    operator()()
-  {
-    master.log->debug("Processing with thread: {}",  this_thread::get_id());
-
-    std::vector<int>      local;
-    do
-    {
-      int cur = (*idx.access())++;
-
-      if ((size_t)cur >= blocks.size())
-          return;
-
-      int i = blocks[cur];
-      if (master.block(i))
-      {
-          if (local.size() == (size_t)local_limit)
-              master.unload(local);
-          local.push_back(i);
-      }
-
-      master.log->debug("Processing block: {}", master.gid(i));
-
-      bool skip = all_skip(i);
-
-      IncomingQueuesMap &current_incoming = master.incoming_[master.exchange_round_].map;
-
-      if (master.block(i) == 0)             // block unloaded
-      {
-          if (skip)
-              master.load_queues(i);        // even though we are skipping the block, the queues might be necessary
-          else
-          {
-              if (local.size() == (size_t)local_limit)      // reached the local limit
-                  master.unload(local);
-
-              master.load(i);
-              local.push_back(i);
-          }
-      }
-
-      for (auto& cmd : master.commands_)
-      {
-          cmd->execute(skip ? 0 : master.block(i), master.proxy(i));
-
-          // no longer need them, so get rid of them
-          current_incoming[master.gid(i)].queues.clear();
-          current_incoming[master.gid(i)].records.clear();
-      }
-
-      if (skip && master.block(i) == 0)
-          master.unload_queues(i);    // even though we are skipping the block, the queues might be necessary
-    } while(true);
-  }
-
-  bool  all_skip(int i) const
-  {
-      bool skip = true;
-      for (auto& cmd : master.commands_)
-      {
-          if (!cmd->skip(i, master))
-          {
-              skip = false;
-              break;
-          }
-      }
-      return skip;
-  }
-
-  Master&                 master;
-  const std::deque<int>&  blocks;
-  int                     local_limit;
-  critical_resource<int>& idx;
-};
-
-void
-diy::Master::
-execute()
-{
-  log->debug("Entered execute()");
-  auto scoped = prof.scoped("execute");
-  DIY_UNUSED(scoped);
-  //show_incoming_records();
-
-  // touch the outgoing and incoming queues as well as collectives to make sure they exist
-  for (unsigned i = 0; i < size(); ++i)
-  {
-    outgoing(gid(i));
-    incoming(gid(i));           // implicitly touches queue records
-    collectives(gid(i));
-  }
-
-  if (commands_.empty())
-      return;
-
-  // Order the blocks, so the loaded ones come first
-  std::deque<int>   blocks;
-  for (unsigned i = 0; i < size(); ++i)
-    if (block(i) == 0)
-        blocks.push_back(i);
-    else
-        blocks.push_front(i);
-
-  // don't use more threads than we can have blocks in memory
-  int num_threads;
-  int blocks_per_thread;
-  if (limit_ == -1)
-  {
-    num_threads = threads_;
-    blocks_per_thread = size();
-  }
-  else
-  {
-    num_threads = std::min(threads_, limit_);
-    blocks_per_thread = limit_/num_threads;
-  }
-
-  // idx is shared
-  critical_resource<int> idx(0);
-
-  if (num_threads > 1)
-  {
-    // launch the threads
-    std::list<thread>   threads;
-    for (unsigned i = 0; i < (unsigned)num_threads; ++i)
-        threads.emplace_back(ProcessBlock(*this, blocks, blocks_per_thread, idx));
-
-    for (auto& t : threads)
-        t.join();
-  } else
-  {
-      ProcessBlock(*this, blocks, blocks_per_thread, idx)();
-  }
-
-  // clear incoming queues
-  incoming_[exchange_round_].map.clear();
-
-  if (limit() != -1 && in_memory() > limit())
-      throw std::runtime_error(fmt::format("Fatal: {} blocks in memory, with limit {}", in_memory(), limit()));
-
-  // clear commands
-  commands_.clear();
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/reduce/all-to-all.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/reduce/all-to-all.hpp
@ -1,168 +0,0 @@
-#ifndef VTKMDIY_DETAIL_ALL_TO_ALL_HPP
-#define VTKMDIY_DETAIL_ALL_TO_ALL_HPP
-
-#include "../block_traits.hpp"
-
-namespace diy
-{
-
-namespace detail
-{
-  template<class Op>
-  struct AllToAllReduce
-  {
-    using Block = typename block_traits<Op>::type;
-
-         AllToAllReduce(const Op& op_, const Assigner& assigner):
-             op(op_)
-    {
-      for (int gid = 0; gid < assigner.nblocks(); ++gid)
-      {
-        BlockID nbr = { gid, assigner.rank(gid) };
-        all_neighbors_link.add_neighbor(nbr);
-      }
-    }
-
-    void operator()(Block* b, const ReduceProxy& srp, const RegularSwapPartners& partners) const
-    {
-      int k_in  = srp.in_link().size();
-      int k_out = srp.out_link().size();
-
-      if (k_in == 0 && k_out == 0)  // special case of a single block
-      {
-          ReduceProxy all_srp_out(srp, srp.block(), 0, srp.assigner(), empty_link,         all_neighbors_link);
-          ReduceProxy all_srp_in (srp, srp.block(), 1, srp.assigner(), all_neighbors_link, empty_link);
-
-          op(b, all_srp_out);
-          MemoryBuffer& in_queue = all_srp_in.incoming(all_srp_in.in_link().target(0).gid);
-          in_queue.swap(all_srp_out.outgoing(all_srp_out.out_link().target(0)));
-          in_queue.reset();
-
-          op(b, all_srp_in);
-          return;
-      }
-
-      if (k_in == 0)                // initial round
-      {
-        ReduceProxy all_srp(srp, srp.block(), 0, srp.assigner(), empty_link, all_neighbors_link);
-        op(b, all_srp);
-
-        Master::OutgoingQueues all_queues;
-        all_queues.swap(*all_srp.outgoing());       // clears out the queues and stores them locally
-
-        // enqueue outgoing
-        int group = all_srp.out_link().size() / k_out;
-        for (int i = 0; i < k_out; ++i)
-        {
-          std::pair<int,int> range(i*group, (i+1)*group);
-          srp.enqueue(srp.out_link().target(i), range);
-          for (int j = i*group; j < (i+1)*group; ++j)
-          {
-            int from = srp.gid();
-            int to   = all_srp.out_link().target(j).gid;
-            srp.enqueue(srp.out_link().target(i), std::make_pair(from, to));
-            srp.enqueue(srp.out_link().target(i), all_queues[all_srp.out_link().target(j)]);
-          }
-        }
-      } else if (k_out == 0)        // final round
-      {
-        // dequeue incoming + reorder into the correct order
-        ReduceProxy all_srp(srp, srp.block(), 1, srp.assigner(), all_neighbors_link, empty_link);
-
-        Master::IncomingQueues all_incoming;
-        all_incoming.swap(*srp.incoming());
-
-        std::pair<int, int> range;      // all the ranges should be the same
-        for (int i = 0; i < k_in; ++i)
-        {
-          int gid_in = srp.in_link().target(i).gid;
-          MemoryBuffer& in = all_incoming[gid_in];
-          load(in, range);
-          while(in)
-          {
-            std::pair<int, int> from_to;
-            load(in, from_to);
-            load(in, all_srp.incoming(from_to.first));
-            all_srp.incoming(from_to.first).reset();
-          }
-        }
-
-        op(b, all_srp);
-      } else                                        // intermediate round: reshuffle queues
-      {
-        // add up buffer sizes
-        std::vector<size_t> sizes_out(k_out, sizeof(std::pair<int,int>));
-        std::pair<int, int> range;      // all the ranges should be the same
-        for (int i = 0; i < k_in; ++i)
-        {
-          MemoryBuffer& in = srp.incoming(srp.in_link().target(i).gid);
-
-          load(in, range);
-          int group = (range.second - range.first)/k_out;
-
-          std::pair<int, int> from_to;
-          size_t s;
-          while(in)
-          {
-            diy::load(in, from_to);
-            diy::load(in, s);
-
-            int j = (from_to.second - range.first) / group;
-            sizes_out[j] += s + sizeof(size_t) + sizeof(std::pair<int,int>);
-            in.skip(s);
-          }
-          in.reset();
-        }
-
-        // reserve outgoing buffers of correct size
-        int group = (range.second - range.first)/k_out;
-        for (int i = 0; i < k_out; ++i)
-        {
-          MemoryBuffer& out = srp.outgoing(srp.out_link().target(i));
-          out.reserve(sizes_out[i]);
-
-          std::pair<int, int> out_range;
-          out_range.first  = range.first + group*i;
-          out_range.second = range.first + group*(i+1);
-          save(out, out_range);
-        }
-
-        // re-direct the queues
-        for (int i = 0; i < k_in; ++i)
-        {
-          MemoryBuffer& in = srp.incoming(srp.in_link().target(i).gid);
-
-          load(in, range);
-
-          std::pair<int, int> from_to;
-          while(in)
-          {
-            load(in, from_to);
-            int j = (from_to.second - range.first) / group;
-
-            MemoryBuffer& out = srp.outgoing(srp.out_link().target(j));
-            save(out, from_to);
-            MemoryBuffer::copy(in, out);
-          }
-        }
-      }
-    }
-
-    const Op&           op;
-    Link                all_neighbors_link, empty_link;
-  };
-
-  struct SkipIntermediate
-  {
-         SkipIntermediate(size_t rounds_):
-            rounds(rounds_)                                     {}
-
-    bool operator()(int round, int, const Master&) const        { if (round == 0 || round == (int) rounds) return false; return true; }
-
-    size_t  rounds;
-  };
-}
-
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/traits.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/detail/traits.hpp
@ -1,318 +0,0 @@
-//--------------------------------------
-// utils/traits: Additional type traits
-//--------------------------------------
-//
-//          Copyright kennytm (auraHT Ltd.) 2011.
-// Distributed under the Boost Software License, Version 1.0.
-//    (See accompanying file doc/LICENSE_1_0.txt or copy at
-//          http://www.boost.org/LICENSE_1_0.txt)
-
-/**
-
-``<utils/traits.hpp>`` --- Additional type traits
-=================================================
-
-This module provides additional type traits and related functions, missing from
-the standard library.
-
-*/
-
-#ifndef VTKMDIY_UTILS_TRAITS_HPP
-#define VTKMDIY_UTILS_TRAITS_HPP
-
-#include <cstdlib>
-#include <tuple>
-#include <functional>
-#include <type_traits>
-
-namespace diy
-{
-namespace detail {
-
-/**
-.. macro:: DECLARE_HAS_TYPE_MEMBER(member_name)
-
-    This macro declares a template ``has_member_name`` which will check whether
-    a type member ``member_name`` exists in a particular type.
-
-    Example::
-
-        DECLARE_HAS_TYPE_MEMBER(result_type)
-
-        ...
-
-        printf("%d\n", has_result_type< std::plus<int> >::value);
-        // ^ prints '1' (true)
-        printf("%d\n", has_result_type< double(*)() >::value);
-        // ^ prints '0' (false)
-*/
-#define DECLARE_HAS_TYPE_MEMBER(member_name) \
-    template <typename, typename = void> \
-    struct has_##member_name \
-    { enum { value = false }; }; \
-    template <typename T> \
-    struct has_##member_name<T, typename std::enable_if<sizeof(typename T::member_name)||true>::type> \
-    { enum { value = true }; };
-
-/**
-.. type:: struct utils::function_traits<F>
-
-    Obtain compile-time information about a function object *F*.
-
-    This template currently supports the following types:
-
-    * Normal function types (``R(T...)``), function pointers (``R(*)(T...)``)
-      and function references (``R(&)(T...)`` and ``R(&&)(T...)``).
-    * Member functions (``R(C::*)(T...)``)
-    * ``std::function<F>``
-    * Type of lambda functions, and any other types that has a unique
-      ``operator()``.
-    * Type of ``std::mem_fn`` (only for GCC's libstdc++ and LLVM's libc++).
-      Following the C++ spec, the first argument will be a raw pointer.
-*/
-template <typename T>
-struct function_traits
-    : public function_traits<decltype(&T::operator())>
-{};
-
-namespace xx_impl
-{
-    template <typename C, typename R, typename... A>
-    struct memfn_type
-    {
-        typedef typename std::conditional<
-            std::is_const<C>::value,
-            typename std::conditional<
-                std::is_volatile<C>::value,
-                R (C::*)(A...) const volatile,
-                R (C::*)(A...) const
-            >::type,
-            typename std::conditional<
-                std::is_volatile<C>::value,
-                R (C::*)(A...) volatile,
-                R (C::*)(A...)
-            >::type
-        >::type type;
-    };
-}
-
-template <typename ReturnType, typename... Args>
-struct function_traits<ReturnType(Args...)>
-{
-    /**
-    .. type:: type result_type
-
-        The type returned by calling an instance of the function object type *F*.
-    */
-    typedef ReturnType result_type;
-
-    /**
-    .. type:: type function_type
-
-        The function type (``R(T...)``).
-    */
-    typedef ReturnType function_type(Args...);
-
-    /**
-    .. type:: type member_function_type<OwnerType>
-
-        The member function type for an *OwnerType* (``R(OwnerType::*)(T...)``).
-    */
-    template <typename OwnerType>
-    using member_function_type = typename xx_impl::memfn_type<
-        typename std::remove_pointer<typename std::remove_reference<OwnerType>::type>::type,
-        ReturnType, Args...
-    >::type;
-
-    /**
-    .. data:: static const size_t arity
-
-        Number of arguments the function object will take.
-    */
-    enum { arity = sizeof...(Args) };
-
-    /**
-    .. type:: type arg<n>::type
-
-        The type of the *n*-th argument.
-    */
-    template <size_t i>
-    struct arg
-    {
-        typedef typename std::tuple_element<i, std::tuple<Args...>>::type type;
-    };
-};
-
-template <typename ReturnType, typename... Args>
-struct function_traits<ReturnType(*)(Args...)>
-    : public function_traits<ReturnType(Args...)>
-{};
-
-template <typename ClassType, typename ReturnType, typename... Args>
-struct function_traits<ReturnType(ClassType::*)(Args...)>
-    : public function_traits<ReturnType(Args...)>
-{
-    typedef ClassType& owner_type;
-};
-
-template <typename ClassType, typename ReturnType, typename... Args>
-struct function_traits<ReturnType(ClassType::*)(Args...) const>
-    : public function_traits<ReturnType(Args...)>
-{
-    typedef const ClassType& owner_type;
-};
-
-template <typename ClassType, typename ReturnType, typename... Args>
-struct function_traits<ReturnType(ClassType::*)(Args...) volatile>
-    : public function_traits<ReturnType(Args...)>
-{
-    typedef volatile ClassType& owner_type;
-};
-
-template <typename ClassType, typename ReturnType, typename... Args>
-struct function_traits<ReturnType(ClassType::*)(Args...) const volatile>
-    : public function_traits<ReturnType(Args...)>
-{
-    typedef const volatile ClassType& owner_type;
-};
-
-template <typename FunctionType>
-struct function_traits<std::function<FunctionType>>
-    : public function_traits<FunctionType>
-{};
-
-#if defined(_GLIBCXX_FUNCTIONAL)
-#define MEM_FN_SYMBOL_XX0SL7G4Z0J std::_Mem_fn
-#elif defined(_LIBCPP_FUNCTIONAL)
-#define MEM_FN_SYMBOL_XX0SL7G4Z0J std::__mem_fn
-#endif
-
-#ifdef MEM_FN_SYMBOL_XX0SL7G4Z0J
-
-template <typename R, typename C>
-struct function_traits<MEM_FN_SYMBOL_XX0SL7G4Z0J<R C::*>>
-    : public function_traits<R(C*)>
-{};
-template <typename R, typename C, typename... A>
-struct function_traits<MEM_FN_SYMBOL_XX0SL7G4Z0J<R(C::*)(A...)>>
-    : public function_traits<R(C*, A...)>
-{};
-template <typename R, typename C, typename... A>
-struct function_traits<MEM_FN_SYMBOL_XX0SL7G4Z0J<R(C::*)(A...) const>>
-    : public function_traits<R(const C*, A...)>
-{};
-template <typename R, typename C, typename... A>
-struct function_traits<MEM_FN_SYMBOL_XX0SL7G4Z0J<R(C::*)(A...) volatile>>
-    : public function_traits<R(volatile C*, A...)>
-{};
-template <typename R, typename C, typename... A>
-struct function_traits<MEM_FN_SYMBOL_XX0SL7G4Z0J<R(C::*)(A...) const volatile>>
-    : public function_traits<R(const volatile C*, A...)>
-{};
-
-#undef MEM_FN_SYMBOL_XX0SL7G4Z0J
-#endif
-
-template <typename T>
-struct function_traits<T&> : public function_traits<T> {};
-template <typename T>
-struct function_traits<const T&> : public function_traits<T> {};
-template <typename T>
-struct function_traits<volatile T&> : public function_traits<T> {};
-template <typename T>
-struct function_traits<const volatile T&> : public function_traits<T> {};
-template <typename T>
-struct function_traits<T&&> : public function_traits<T> {};
-template <typename T>
-struct function_traits<const T&&> : public function_traits<T> {};
-template <typename T>
-struct function_traits<volatile T&&> : public function_traits<T> {};
-template <typename T>
-struct function_traits<const volatile T&&> : public function_traits<T> {};
-
-
-#define FORWARD_RES_8QR485JMSBT \
-    typename std::conditional< \
-        std::is_lvalue_reference<R>::value, \
-        T&, \
-        typename std::remove_reference<T>::type&& \
-    >::type
-
-/**
-.. function:: auto utils::forward_like<Like, T>(T&& t) noexcept
-
-    Forward the reference *t* like the type of *Like*. That means, if *Like* is
-    an lvalue (reference), this function will return an lvalue reference of *t*.
-    Otherwise, if *Like* is an rvalue, this function will return an rvalue
-    reference of *t*.
-
-    This is mainly used to propagate the expression category (lvalue/rvalue) of
-    a member of *Like*, generalizing ``std::forward``.
-*/
-template <typename R, typename T>
-FORWARD_RES_8QR485JMSBT forward_like(T&& input) noexcept
-{
-    return static_cast<FORWARD_RES_8QR485JMSBT>(input);
-}
-
-#undef FORWARD_RES_8QR485JMSBT
-
-/**
-.. type:: struct utils::copy_cv<From, To>
-
-    Copy the CV qualifier between the two types. For example,
-    ``utils::copy_cv<const int, double>::type`` will become ``const double``.
-*/
-template <typename From, typename To>
-struct copy_cv
-{
-private:
-    typedef typename std::remove_cv<To>::type raw_To;
-    typedef typename std::conditional<std::is_const<From>::value,
-                                      const raw_To, raw_To>::type const_raw_To;
-public:
-    /**
-    .. type:: type type
-
-        Result of cv-copying.
-    */
-    typedef typename std::conditional<std::is_volatile<From>::value,
-                                      volatile const_raw_To, const_raw_To>::type type;
-};
-
-/**
-.. type:: struct utils::pointee<T>
-
-    Returns the type by derefering an instance of *T*. This is a generalization
-    of ``std::remove_pointer``, that it also works with iterators.
-*/
-template <typename T>
-struct pointee
-{
-    /**
-    .. type:: type type
-
-        Result of dereferencing.
-    */
-    typedef typename std::remove_reference<decltype(*std::declval<T>())>::type type;
-};
-
-/**
-.. function:: std::add_rvalue_reference<T>::type utils::rt_val<T>() noexcept
-
-    Returns a value of type *T*. It is guaranteed to do nothing and will not
-    throw a compile-time error, but using the returned result will cause
-    undefined behavior.
-*/
-template <typename T>
-typename std::add_rvalue_reference<T>::type rt_val() noexcept
-{
-    return std::move(*static_cast<T*>(nullptr));
-}
-
-}
-
-}
-
-#endif
-
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/fmt/format.cc
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/fmt/format.cc
@ -1,535 +0,0 @@
-/*
- Formatting library for C++
-
- Copyright (c) 2012 - 2016, Victor Zverovich
- All rights reserved.
-
- Redistribution and use in source and binary forms, with or without
- modification, are permitted provided that the following conditions are met:
-
- 1. Redistributions of source code must retain the above copyright notice, this
-    list of conditions and the following disclaimer.
- 2. Redistributions in binary form must reproduce the above copyright notice,
-    this list of conditions and the following disclaimer in the documentation
-    and/or other materials provided with the distribution.
-
- THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
- ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
- WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
- DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR
- ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
- (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
- LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND
- ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
- (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
- SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
- */
-
-#include "format.h"
-
-#include <string.h>
-
-#include <cctype>
-#include <cerrno>
-#include <climits>
-#include <cmath>
-#include <cstdarg>
-#include <cstddef>  // for std::ptrdiff_t
-
-#if defined(_WIN32) && defined(__MINGW32__)
-# include <cstring>
-#endif
-
-#if FMT_USE_WINDOWS_H
-# if !defined(FMT_HEADER_ONLY) && !defined(WIN32_LEAN_AND_MEAN)
-#  define WIN32_LEAN_AND_MEAN
-# endif
-# if defined(NOMINMAX) || defined(FMT_WIN_MINMAX)
-#  include <windows.h>
-# else
-#  define NOMINMAX
-#  include <windows.h>
-#  undef NOMINMAX
-# endif
-#endif
-
-#if FMT_EXCEPTIONS
-# define FMT_TRY try
-# define FMT_CATCH(x) catch (x)
-#else
-# define FMT_TRY if (true)
-# define FMT_CATCH(x) if (false)
-#endif
-
-#ifdef _MSC_VER
-# pragma warning(push)
-# pragma warning(disable: 4127)  // conditional expression is constant
-# pragma warning(disable: 4702)  // unreachable code
-// Disable deprecation warning for strerror. The latter is not called but
-// MSVC fails to detect it.
-# pragma warning(disable: 4996)
-#endif
-
-// Dummy implementations of strerror_r and strerror_s called if corresponding
-// system functions are not available.
-static inline fmt::internal::Null<> strerror_r(int, char *, ...) {
-  return fmt::internal::Null<>();
-}
-static inline fmt::internal::Null<> strerror_s(char *, std::size_t, ...) {
-  return fmt::internal::Null<>();
-}
-
-namespace fmt {
-
-FMT_FUNC internal::RuntimeError::~RuntimeError() FMT_DTOR_NOEXCEPT {}
-FMT_FUNC FormatError::~FormatError() FMT_DTOR_NOEXCEPT {}
-FMT_FUNC SystemError::~SystemError() FMT_DTOR_NOEXCEPT {}
-
-namespace {
-
-#ifndef _MSC_VER
-# define FMT_SNPRINTF snprintf
-#else  // _MSC_VER
-inline int fmt_snprintf(char *buffer, size_t size, const char *format, ...) {
-  va_list args;
-  va_start(args, format);
-  int result = vsnprintf_s(buffer, size, _TRUNCATE, format, args);
-  va_end(args);
-  return result;
-}
-# define FMT_SNPRINTF fmt_snprintf
-#endif  // _MSC_VER
-
-#if defined(_WIN32) && defined(__MINGW32__) && !defined(__NO_ISOCEXT)
-# define FMT_SWPRINTF snwprintf
-#else
-# define FMT_SWPRINTF swprintf
-#endif // defined(_WIN32) && defined(__MINGW32__) && !defined(__NO_ISOCEXT)
-
-const char RESET_COLOR[] = "\x1b[0m";
-
-typedef void (*FormatFunc)(Writer &, int, StringRef);
-
-// Portable thread-safe version of strerror.
-// Sets buffer to point to a string describing the error code.
-// This can be either a pointer to a string stored in buffer,
-// or a pointer to some static immutable string.
-// Returns one of the following values:
-//   0      - success
-//   ERANGE - buffer is not large enough to store the error message
-//   other  - failure
-// Buffer should be at least of size 1.
-int safe_strerror(
-    int error_code, char *&buffer, std::size_t buffer_size) FMT_NOEXCEPT {
-  FMT_ASSERT(buffer != 0 && buffer_size != 0, "invalid buffer");
-
-  class StrError {
-   private:
-    int error_code_;
-    char *&buffer_;
-    std::size_t buffer_size_;
-
-    // A noop assignment operator to avoid bogus warnings.
-    void operator=(const StrError &) {}
-
-    // Handle the result of XSI-compliant version of strerror_r.
-    int handle(int result) {
-      // glibc versions before 2.13 return result in errno.
-      return result == -1 ? errno : result;
-    }
-
-    // Handle the result of GNU-specific version of strerror_r.
-    int handle(char *message) {
-      // If the buffer is full then the message is probably truncated.
-      if (message == buffer_ && strlen(buffer_) == buffer_size_ - 1)
-        return ERANGE;
-      buffer_ = message;
-      return 0;
-    }
-
-    // Handle the case when strerror_r is not available.
-    int handle(internal::Null<>) {
-      return fallback(strerror_s(buffer_, buffer_size_, error_code_));
-    }
-
-    // Fallback to strerror_s when strerror_r is not available.
-    int fallback(int result) {
-      // If the buffer is full then the message is probably truncated.
-      return result == 0 && strlen(buffer_) == buffer_size_ - 1 ?
-            ERANGE : result;
-    }
-
-    // Fallback to strerror if strerror_r and strerror_s are not available.
-    int fallback(internal::Null<>) {
-      errno = 0;
-      buffer_ = strerror(error_code_);
-      return errno;
-    }
-
-   public:
-    StrError(int err_code, char *&buf, std::size_t buf_size)
-      : error_code_(err_code), buffer_(buf), buffer_size_(buf_size) {}
-
-    int run() {
-      // Suppress a warning about unused strerror_r.
-      strerror_r(0, FMT_NULL, "");
-      return handle(strerror_r(error_code_, buffer_, buffer_size_));
-    }
-  };
-  return StrError(error_code, buffer, buffer_size).run();
-}
-
-void format_error_code(Writer &out, int error_code,
-                       StringRef message) FMT_NOEXCEPT {
-  // Report error code making sure that the output fits into
-  // INLINE_BUFFER_SIZE to avoid dynamic memory allocation and potential
-  // bad_alloc.
-  out.clear();
-  static const char SEP[] = ": ";
-  static const char ERROR_STR[] = "error ";
-  // Subtract 2 to account for terminating null characters in SEP and ERROR_STR.
-  std::size_t error_code_size = sizeof(SEP) + sizeof(ERROR_STR) - 2;
-  typedef internal::IntTraits<int>::MainType MainType;
-  MainType abs_value = static_cast<MainType>(error_code);
-  if (internal::is_negative(error_code)) {
-    abs_value = 0 - abs_value;
-    ++error_code_size;
-  }
-  error_code_size += internal::count_digits(abs_value);
-  if (message.size() <= internal::INLINE_BUFFER_SIZE - error_code_size)
-    out << message << SEP;
-  out << ERROR_STR << error_code;
-  assert(out.size() <= internal::INLINE_BUFFER_SIZE);
-}
-
-void report_error(FormatFunc func, int error_code,
-                  StringRef message) FMT_NOEXCEPT {
-  MemoryWriter full_message;
-  func(full_message, error_code, message);
-  // Use Writer::data instead of Writer::c_str to avoid potential memory
-  // allocation.
-  std::fwrite(full_message.data(), full_message.size(), 1, stderr);
-  std::fputc('\n', stderr);
-}
-}  // namespace
-
-FMT_FUNC void SystemError::init(
-    int err_code, CStringRef format_str, ArgList args) {
-  error_code_ = err_code;
-  MemoryWriter w;
-  format_system_error(w, err_code, format(format_str, args));
-  std::runtime_error &base = *this;
-  base = std::runtime_error(w.str());
-}
-
-template <typename T>
-int internal::CharTraits<char>::format_float(
-    char *buffer, std::size_t size, const char *format,
-    unsigned width, int precision, T value) {
-  if (width == 0) {
-    return precision < 0 ?
-        FMT_SNPRINTF(buffer, size, format, value) :
-        FMT_SNPRINTF(buffer, size, format, precision, value);
-  }
-  return precision < 0 ?
-      FMT_SNPRINTF(buffer, size, format, width, value) :
-      FMT_SNPRINTF(buffer, size, format, width, precision, value);
-}
-
-template <typename T>
-int internal::CharTraits<wchar_t>::format_float(
-    wchar_t *buffer, std::size_t size, const wchar_t *format,
-    unsigned width, int precision, T value) {
-  if (width == 0) {
-    return precision < 0 ?
-        FMT_SWPRINTF(buffer, size, format, value) :
-        FMT_SWPRINTF(buffer, size, format, precision, value);
-  }
-  return precision < 0 ?
-      FMT_SWPRINTF(buffer, size, format, width, value) :
-      FMT_SWPRINTF(buffer, size, format, width, precision, value);
-}
-
-template <typename T>
-const char internal::BasicData<T>::DIGITS[] =
-    "0001020304050607080910111213141516171819"
-    "2021222324252627282930313233343536373839"
-    "4041424344454647484950515253545556575859"
-    "6061626364656667686970717273747576777879"
-    "8081828384858687888990919293949596979899";
-
-#define FMT_POWERS_OF_10(factor) \
-  factor * 10, \
-  factor * 100, \
-  factor * 1000, \
-  factor * 10000, \
-  factor * 100000, \
-  factor * 1000000, \
-  factor * 10000000, \
-  factor * 100000000, \
-  factor * 1000000000
-
-template <typename T>
-const uint32_t internal::BasicData<T>::POWERS_OF_10_32[] = {
-  0, FMT_POWERS_OF_10(1)
-};
-
-template <typename T>
-const uint64_t internal::BasicData<T>::POWERS_OF_10_64[] = {
-  0,
-  FMT_POWERS_OF_10(1),
-  FMT_POWERS_OF_10(ULongLong(1000000000)),
-  // Multiply several constants instead of using a single long long constant
-  // to avoid warnings about C++98 not supporting long long.
-  ULongLong(1000000000) * ULongLong(1000000000) * 10
-};
-
-FMT_FUNC void internal::report_unknown_type(char code, const char *type) {
-  (void)type;
-  if (std::isprint(static_cast<unsigned char>(code))) {
-    FMT_THROW(FormatError(
-        format("unknown format code '{}' for {}", code, type)));
-  }
-  FMT_THROW(FormatError(
-      format("unknown format code '\\x{:02x}' for {}",
-        static_cast<unsigned>(code), type)));
-}
-
-#if FMT_USE_WINDOWS_H
-
-FMT_FUNC internal::UTF8ToUTF16::UTF8ToUTF16(StringRef s) {
-  static const char ERROR_MSG[] = "cannot convert string from UTF-8 to UTF-16";
-  if (s.size() > INT_MAX)
-    FMT_THROW(WindowsError(ERROR_INVALID_PARAMETER, ERROR_MSG));
-  int s_size = static_cast<int>(s.size());
-  int length = MultiByteToWideChar(
-      CP_UTF8, MB_ERR_INVALID_CHARS, s.data(), s_size, FMT_NULL, 0);
-  if (length == 0)
-    FMT_THROW(WindowsError(GetLastError(), ERROR_MSG));
-  buffer_.resize(length + 1);
-  length = MultiByteToWideChar(
-    CP_UTF8, MB_ERR_INVALID_CHARS, s.data(), s_size, &buffer_[0], length);
-  if (length == 0)
-    FMT_THROW(WindowsError(GetLastError(), ERROR_MSG));
-  buffer_[length] = 0;
-}
-
-FMT_FUNC internal::UTF16ToUTF8::UTF16ToUTF8(WStringRef s) {
-  if (int error_code = convert(s)) {
-    FMT_THROW(WindowsError(error_code,
-        "cannot convert string from UTF-16 to UTF-8"));
-  }
-}
-
-FMT_FUNC int internal::UTF16ToUTF8::convert(WStringRef s) {
-  if (s.size() > INT_MAX)
-    return ERROR_INVALID_PARAMETER;
-  int s_size = static_cast<int>(s.size());
-  int length = WideCharToMultiByte(
-    CP_UTF8, 0, s.data(), s_size, FMT_NULL, 0, FMT_NULL, FMT_NULL);
-  if (length == 0)
-    return GetLastError();
-  buffer_.resize(length + 1);
-  length = WideCharToMultiByte(
-    CP_UTF8, 0, s.data(), s_size, &buffer_[0], length, FMT_NULL, FMT_NULL);
-  if (length == 0)
-    return GetLastError();
-  buffer_[length] = 0;
-  return 0;
-}
-
-FMT_FUNC void WindowsError::init(
-    int err_code, CStringRef format_str, ArgList args) {
-  error_code_ = err_code;
-  MemoryWriter w;
-  internal::format_windows_error(w, err_code, format(format_str, args));
-  std::runtime_error &base = *this;
-  base = std::runtime_error(w.str());
-}
-
-FMT_FUNC void internal::format_windows_error(
-    Writer &out, int error_code, StringRef message) FMT_NOEXCEPT {
-  FMT_TRY {
-    MemoryBuffer<wchar_t, INLINE_BUFFER_SIZE> buffer;
-    buffer.resize(INLINE_BUFFER_SIZE);
-    for (;;) {
-      wchar_t *system_message = &buffer[0];
-      int result = FormatMessageW(
-        FORMAT_MESSAGE_FROM_SYSTEM | FORMAT_MESSAGE_IGNORE_INSERTS,
-        FMT_NULL, error_code, MAKELANGID(LANG_NEUTRAL, SUBLANG_DEFAULT),
-        system_message, static_cast<uint32_t>(buffer.size()), FMT_NULL);
-      if (result != 0) {
-        UTF16ToUTF8 utf8_message;
-        if (utf8_message.convert(system_message) == ERROR_SUCCESS) {
-          out << message << ": " << utf8_message;
-          return;
-        }
-        break;
-      }
-      if (GetLastError() != ERROR_INSUFFICIENT_BUFFER)
-        break;  // Can't get error message, report error code instead.
-      buffer.resize(buffer.size() * 2);
-    }
-  } FMT_CATCH(...) {}
-  fmt::format_error_code(out, error_code, message);  // 'fmt::' is for bcc32.
-}
-
-#endif  // FMT_USE_WINDOWS_H
-
-FMT_FUNC void format_system_error(
-    Writer &out, int error_code, StringRef message) FMT_NOEXCEPT {
-  FMT_TRY {
-    internal::MemoryBuffer<char, internal::INLINE_BUFFER_SIZE> buffer;
-    buffer.resize(internal::INLINE_BUFFER_SIZE);
-    for (;;) {
-      char *system_message = &buffer[0];
-      int result = safe_strerror(error_code, system_message, buffer.size());
-      if (result == 0) {
-        out << message << ": " << system_message;
-        return;
-      }
-      if (result != ERANGE)
-        break;  // Can't get error message, report error code instead.
-      buffer.resize(buffer.size() * 2);
-    }
-  } FMT_CATCH(...) {}
-  fmt::format_error_code(out, error_code, message);  // 'fmt::' is for bcc32.
-}
-
-template <typename Char>
-void internal::ArgMap<Char>::init(const ArgList &args) {
-  if (!map_.empty())
-    return;
-  typedef internal::NamedArg<Char> NamedArg;
-  const NamedArg *named_arg = FMT_NULL;
-  bool use_values =
-      args.type(ArgList::MAX_PACKED_ARGS - 1) == internal::Arg::NONE;
-  if (use_values) {
-    for (unsigned i = 0;/*nothing*/; ++i) {
-      internal::Arg::Type arg_type = args.type(i);
-      switch (arg_type) {
-      case internal::Arg::NONE:
-        return;
-      case internal::Arg::NAMED_ARG:
-        named_arg = static_cast<const NamedArg*>(args.values_[i].pointer);
-        map_.push_back(Pair(named_arg->name, *named_arg));
-        break;
-      default:
-        /*nothing*/;
-      }
-    }
-    return;
-  }
-  for (unsigned i = 0; i != ArgList::MAX_PACKED_ARGS; ++i) {
-    internal::Arg::Type arg_type = args.type(i);
-    if (arg_type == internal::Arg::NAMED_ARG) {
-      named_arg = static_cast<const NamedArg*>(args.args_[i].pointer);
-      map_.push_back(Pair(named_arg->name, *named_arg));
-    }
-  }
-  for (unsigned i = ArgList::MAX_PACKED_ARGS;/*nothing*/; ++i) {
-    switch (args.args_[i].type) {
-    case internal::Arg::NONE:
-      return;
-    case internal::Arg::NAMED_ARG:
-      named_arg = static_cast<const NamedArg*>(args.args_[i].pointer);
-      map_.push_back(Pair(named_arg->name, *named_arg));
-      break;
-    default:
-      /*nothing*/;
-    }
-  }
-}
-
-template <typename Char>
-void internal::FixedBuffer<Char>::grow(std::size_t) {
-  FMT_THROW(std::runtime_error("buffer overflow"));
-}
-
-FMT_FUNC internal::Arg internal::FormatterBase::do_get_arg(
-    unsigned arg_index, const char *&error) {
-  internal::Arg arg = args_[arg_index];
-  switch (arg.type) {
-  case internal::Arg::NONE:
-    error = "argument index out of range";
-    break;
-  case internal::Arg::NAMED_ARG:
-    arg = *static_cast<const internal::Arg*>(arg.pointer);
-    break;
-  default:
-    /*nothing*/;
-  }
-  return arg;
-}
-
-FMT_FUNC void report_system_error(
-    int error_code, fmt::StringRef message) FMT_NOEXCEPT {
-  // 'fmt::' is for bcc32.
-  report_error(format_system_error, error_code, message);
-}
-
-#if FMT_USE_WINDOWS_H
-FMT_FUNC void report_windows_error(
-    int error_code, fmt::StringRef message) FMT_NOEXCEPT {
-  // 'fmt::' is for bcc32.
-  report_error(internal::format_windows_error, error_code, message);
-}
-#endif
-
-FMT_FUNC void print(std::FILE *f, CStringRef format_str, ArgList args) {
-  MemoryWriter w;
-  w.write(format_str, args);
-  std::fwrite(w.data(), 1, w.size(), f);
-}
-
-FMT_FUNC void print(CStringRef format_str, ArgList args) {
-  print(stdout, format_str, args);
-}
-
-FMT_FUNC void print_colored(Color c, CStringRef format, ArgList args) {
-  char escape[] = "\x1b[30m";
-  escape[3] = static_cast<char>('0' + c);
-  std::fputs(escape, stdout);
-  print(format, args);
-  std::fputs(RESET_COLOR, stdout);
-}
-
-#ifndef FMT_HEADER_ONLY
-
-template struct internal::BasicData<void>;
-
-// Explicit instantiations for char.
-
-template void internal::FixedBuffer<char>::grow(std::size_t);
-
-template void internal::ArgMap<char>::init(const ArgList &args);
-
-template FMT_API int internal::CharTraits<char>::format_float(
-    char *buffer, std::size_t size, const char *format,
-    unsigned width, int precision, double value);
-
-template FMT_API int internal::CharTraits<char>::format_float(
-    char *buffer, std::size_t size, const char *format,
-    unsigned width, int precision, long double value);
-
-// Explicit instantiations for wchar_t.
-
-template void internal::FixedBuffer<wchar_t>::grow(std::size_t);
-
-template void internal::ArgMap<wchar_t>::init(const ArgList &args);
-
-template FMT_API int internal::CharTraits<wchar_t>::format_float(
-    wchar_t *buffer, std::size_t size, const wchar_t *format,
-    unsigned width, int precision, double value);
-
-template FMT_API int internal::CharTraits<wchar_t>::format_float(
-    wchar_t *buffer, std::size_t size, const wchar_t *format,
-    unsigned width, int precision, long double value);
-
-#endif  // FMT_HEADER_ONLY
-
-}  // namespace fmt
-
-#ifdef _MSC_VER
-# pragma warning(pop)
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/fmt/format.h
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/fmt/format.h
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/fmt/ostream.cc
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/fmt/ostream.cc
@ -1,35 +0,0 @@
-/*
- Formatting library for C++ - std::ostream support
-
- Copyright (c) 2012 - 2016, Victor Zverovich
- All rights reserved.
-
- For the license information refer to format.h.
- */
-
-#include "ostream.h"
-
-namespace fmt {
-
-namespace internal {
-FMT_FUNC void write(std::ostream &os, Writer &w) {
-  const char *data = w.data();
-  typedef internal::MakeUnsigned<std::streamsize>::Type UnsignedStreamSize;
-  UnsignedStreamSize size = w.size();
-  UnsignedStreamSize max_size =
-      internal::to_unsigned((std::numeric_limits<std::streamsize>::max)());
-  do {
-    UnsignedStreamSize n = size <= max_size ? size : max_size;
-    os.write(data, static_cast<std::streamsize>(n));
-    data += n;
-    size -= n;
-  } while (size != 0);
-}
-}
-
-FMT_FUNC void print(std::ostream &os, CStringRef format_str, ArgList args) {
-  MemoryWriter w;
-  w.write(format_str, args);
-  internal::write(os, w);
-}
-}  // namespace fmt
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/fmt/ostream.h
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/fmt/ostream.h
@ -1,105 +0,0 @@
-/*
- Formatting library for C++ - std::ostream support
-
- Copyright (c) 2012 - 2016, Victor Zverovich
- All rights reserved.
-
- For the license information refer to format.h.
- */
-
-#ifndef FMT_OSTREAM_H_
-#define FMT_OSTREAM_H_
-
-#include "format.h"
-#include <ostream>
-
-namespace fmt {
-
-namespace internal {
-
-template <class Char>
-class FormatBuf : public std::basic_streambuf<Char> {
- private:
-  typedef typename std::basic_streambuf<Char>::int_type int_type;
-  typedef typename std::basic_streambuf<Char>::traits_type traits_type;
-
-  Buffer<Char> &buffer_;
-
- public:
-  FormatBuf(Buffer<Char> &buffer) : buffer_(buffer) {}
-
- protected:
-  // The put-area is actually always empty. This makes the implementation
-  // simpler and has the advantage that the streambuf and the buffer are always
-  // in sync and sputc never writes into uninitialized memory. The obvious
-  // disadvantage is that each call to sputc always results in a (virtual) call
-  // to overflow. There is no disadvantage here for sputn since this always
-  // results in a call to xsputn.
-
-  int_type overflow(int_type ch = traits_type::eof()) FMT_OVERRIDE {
-    if (!traits_type::eq_int_type(ch, traits_type::eof()))
-      buffer_.push_back(static_cast<Char>(ch));
-    return ch;
-  }
-
-  std::streamsize xsputn(const Char *s, std::streamsize count) FMT_OVERRIDE {
-    buffer_.append(s, s + count);
-    return count;
-  }
-};
-
-Yes &convert(std::ostream &);
-
-struct DummyStream : std::ostream {
-  DummyStream();  // Suppress a bogus warning in MSVC.
-  // Hide all operator<< overloads from std::ostream.
-  void operator<<(Null<>);
-};
-
-No &operator<<(std::ostream &, int);
-
-template<typename T>
-struct ConvertToIntImpl<T, true> {
-  // Convert to int only if T doesn't have an overloaded operator<<.
-  enum {
-    value = sizeof(convert(get<DummyStream>() << get<T>())) == sizeof(No)
-  };
-};
-
-// Write the content of w to os.
-FMT_API void write(std::ostream &os, Writer &w);
-}  // namespace internal
-
-// Formats a value.
-template <typename Char, typename ArgFormatter_, typename T>
-void format_arg(BasicFormatter<Char, ArgFormatter_> &f,
-                const Char *&format_str, const T &value) {
-  internal::MemoryBuffer<Char, internal::INLINE_BUFFER_SIZE> buffer;
-
-  internal::FormatBuf<Char> format_buf(buffer);
-  std::basic_ostream<Char> output(&format_buf);
-  output << value;
-
-  BasicStringRef<Char> str(&buffer[0], buffer.size());
-  typedef internal::MakeArg< BasicFormatter<Char> > MakeArg;
-  format_str = f.format(format_str, MakeArg(str));
-}
-
-/**
-  \rst
-  Prints formatted data to the stream *os*.
-
-  **Example**::
-
-    print(cerr, "Don't {}!", "panic");
-  \endrst
- */
-FMT_API void print(std::ostream &os, CStringRef format_str, ArgList args);
-FMT_VARIADIC(void, print, std::ostream &, CStringRef)
-}  // namespace fmt
-
-#ifdef FMT_HEADER_ONLY
-# include "ostream.cc"
-#endif
-
-#endif  // FMT_OSTREAM_H_
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/grid.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/grid.hpp
@ -1,198 +0,0 @@
-#ifndef VTKMDIY_GRID_HPP
-#define VTKMDIY_GRID_HPP
-
-#include "point.hpp"
-
-namespace diy
-{
-
-template<class C, unsigned D>
-struct Grid;
-
-template<class C, unsigned D>
-struct GridRef
-{
-    public:
-        typedef     C                                           Value;
-
-        typedef     Point<int, D>                               Vertex;
-        typedef     size_t                                      Index;
-
-    public:
-        template<class Int>
-                GridRef(C* data, const Point<Int,D>& shape, bool c_order = true):
-                    data_(data), shape_(shape), c_order_(c_order)   { set_stride(); }
-
-                GridRef(Grid<C,D>& g):
-                    data_(g.data()), shape_(g.shape()),
-                    c_order_(g.c_order())                       { set_stride(); }
-
-        template<class Int>
-        C       operator()(const Point<Int, D>& v) const        { return (*this)(index(v)); }
-
-        template<class Int>
-        C&      operator()(const Point<Int, D>& v)              { return (*this)(index(v)); }
-
-        C       operator()(Index i) const                       { return data_[i]; }
-        C&      operator()(Index i)                             { return data_[i]; }
-
-        const Vertex&
-                shape() const                                   { return shape_; }
-
-        const C*
-                data() const                                    { return data_; }
-        C*      data()                                          { return data_; }
-
-        // Set every element to the given value
-        GridRef&    operator=(C value)                          { Index s = size(); for (Index i = 0; i < s; ++i) data_[i] = value; return *this; }
-        GridRef&    operator/=(C value)                         { Index s = size(); for (Index i = 0; i < s; ++i) data_[i] /= value; return *this; }
-
-        inline
-        bool        operator==(const GridRef& other) const;
-        bool        operator!=(const GridRef& other) const      { return !(*this == other); }
-
-
-        inline
-        Vertex      vertex(Index idx) const;
-
-        Index       index(const Vertex& v) const                { Index idx = 0; for (unsigned i = 0; i < D; ++i) { idx += ((Index) v[i]) * ((Index) stride_[i]); } return idx; }
-
-        Index       size() const                                { return size(shape()); }
-        void        swap(GridRef& other)                        { std::swap(data_, other.data_); std::swap(shape_, other.shape_); std::swap(stride_, other.stride_); std::swap(c_order_, other.c_order_); }
-
-        bool        c_order() const                             { return c_order_; }
-
-        static constexpr
-        unsigned    dimension()                                 { return D; }
-
-    protected:
-        static Index
-                size(const Vertex& v)                           { Index res = 1; for (unsigned i = 0; i < D; ++i) res *= v[i]; return res; }
-
-        void    set_stride()
-        {
-            Index cur = 1;
-            if (c_order_)
-                for (unsigned i = D; i > 0; --i) { stride_[i-1] = cur; cur *= shape_[i-1]; }
-            else
-                for (unsigned i = 0; i < D; ++i) { stride_[i] = cur; cur *= shape_[i]; }
-
-        }
-        void    set_shape(const Vertex& v)                      { shape_ = v; set_stride(); }
-        void    set_data(C* data)                               { data_ = data; }
-        void    set_c_order(bool order)                         { c_order_ = order; }
-
-    private:
-        C*      data_;
-        Vertex  shape_;
-        Vertex  stride_;
-        bool    c_order_;
-};
-
-
-template<class C, unsigned D>
-struct Grid: public GridRef<C,D>
-{
-    public:
-        typedef     GridRef<C,D>                                Parent;
-        typedef     typename Parent::Value                      Value;
-        typedef     typename Parent::Index                      Index;
-        typedef     typename Parent::Vertex                     Vertex;
-        typedef     Parent                                      Reference;
-
-        template<class U>
-        struct rebind { typedef Grid<U,D>                       type; };
-
-    public:
-                Grid():
-                    Parent(new C[0], Vertex::zero())            {}
-        template<class Int>
-                Grid(const Point<Int, D>& shape, bool c_order = true):
-                    Parent(new C[size(shape)], shape, c_order)
-                {}
-
-                Grid(Grid&& g): Grid()                          { Parent::swap(g); }
-
-                Grid(const Parent& g):
-                    Parent(new C[size(g.shape())], g.shape(),
-                           g.c_order())                         { copy_data(g.data()); }
-
-        template<class OtherGrid>
-                Grid(const OtherGrid& g):
-                    Parent(new C[size(g.shape())],
-                           g.shape(),
-                           g.c_order())                         { copy_data(g.data()); }
-
-                ~Grid()                                         { delete[] Parent::data(); }
-
-        template<class OC>
-        Grid&   operator=(const GridRef<OC, D>& other)
-        {
-            delete[] Parent::data();
-            Parent::set_c_order(other.c_order());       // NB: order needs to be set before the shape, to set the stride correctly
-            Parent::set_shape(other.shape());
-            Index s = size(shape());
-            Parent::set_data(new C[s]);
-            copy_data(other.data());
-            return *this;
-        }
-
-        Grid&   operator=(Grid&& g)                             { Parent::swap(g); return *this; }
-
-        using Parent::data;
-        using Parent::shape;
-        using Parent::operator();
-        using Parent::operator=;
-        using Parent::size;
-
-    private:
-        template<class OC>
-        void    copy_data(const OC* data)
-        {
-            Index s = size(shape());
-            for (Index i = 0; i < s; ++i)
-                Parent::data()[i] = data[i];
-        }
-};
-
-
-template<class C, unsigned D>
-bool GridRef<C, D>::operator==(const GridRef<C, D>& other) const
-{
-    if (c_order() != other.c_order())
-        return false;
-
-    if (shape() != other.shape())
-        return false;
-
-    for(unsigned i = 0; i < size(); ++i)
-        if (data()[i] != other.data()[i])
-            return false;
-
-    return true;
-}
-
-template<class C, unsigned D>
-typename GridRef<C, D>::Vertex
-GridRef<C, D>::
-vertex(typename GridRef<C, D>::Index idx) const
-{
-    Vertex v;
-    if (c_order())
-        for (unsigned i = 0; i < D; ++i)
-        {
-            v[i] = idx / stride_[i];
-            idx %= stride_[i];
-        }
-    else
-        for (int i = D-1; i >= 0; --i)
-        {
-            v[i] = idx / stride_[i];
-            idx %= stride_[i];
-        }
-    return v;
-}
-
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/block.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/block.hpp
@ -1,392 +0,0 @@
-#ifndef VTKMDIY_IO_BLOCK_HPP
-#define VTKMDIY_IO_BLOCK_HPP
-
-#include <string>
-#include <algorithm>
-#include <stdexcept>
-
-#include "../mpi.hpp"
-#include "../assigner.hpp"
-#include "../master.hpp"
-#include "../storage.hpp"
-#include "../log.hpp"
-#include "utils.hpp"
-
-// Read and write collections of blocks using MPI-IO
-namespace diy
-{
-namespace io
-{
-  namespace detail
-  {
-    typedef mpi::io::offset                 offset_t;
-
-    struct GidOffsetCount
-    {
-                    GidOffsetCount():                                   // need to initialize a vector of given size
-                        gid(-1), offset(0), count(0)                    {}
-
-                    GidOffsetCount(int gid_, offset_t offset_, offset_t count_):
-                        gid(gid_), offset(offset_), count(count_)       {}
-
-        bool        operator<(const GidOffsetCount& other) const        { return gid < other.gid; }
-
-        int         gid;
-        offset_t    offset;
-        offset_t    count;
-    };
-  }
-}
-
-// Serialize GidOffsetCount explicitly, to avoid alignment and uninitialized data issues
-// (to get identical output files given the same block input)
-template<>
-struct Serialization<io::detail::GidOffsetCount>
-{
-    typedef             io::detail::GidOffsetCount                  GidOffsetCount;
-
-    static void         save(BinaryBuffer& bb, const GidOffsetCount& x)
-    {
-      diy::save(bb, x.gid);
-      diy::save(bb, x.offset);
-      diy::save(bb, x.count);
-    }
-
-    static void         load(BinaryBuffer& bb, GidOffsetCount& x)
-    {
-      diy::load(bb, x.gid);
-      diy::load(bb, x.offset);
-      diy::load(bb, x.count);
-    }
-};
-
-namespace io
-{
-/**
- * \ingroup IO
- * \brief Write blocks to storage collectively in one shared file
- */
-  inline
-  void
-  write_blocks(const std::string&           outfilename,           //!< output file name
-               const mpi::communicator&     comm,                  //!< communicator
-               Master&                      master,                //!< master object
-               const MemoryBuffer&          extra = MemoryBuffer(),//!< user-defined metadata for file header; meaningful only on rank == 0
-               Master::SaveBlock            save = 0)              //!< block save function in case different than or undefined in the master
-  {
-    if (!save) save = master.saver();       // save is likely to be different from master.save()
-
-    typedef detail::offset_t                offset_t;
-    typedef detail::GidOffsetCount          GidOffsetCount;
-
-    unsigned size = master.size(),
-             max_size, min_size;
-    mpi::all_reduce(comm, size, max_size, mpi::maximum<unsigned>());
-    mpi::all_reduce(comm, size, min_size, mpi::minimum<unsigned>());
-
-    // truncate the file
-    if (comm.rank() == 0)
-        diy::io::utils::truncate(outfilename.c_str(), 0);
-
-    mpi::io::file f(comm, outfilename, mpi::io::file::wronly | mpi::io::file::create);
-
-    offset_t  start = 0, shift;
-    std::vector<GidOffsetCount>     offset_counts;
-    for (unsigned i = 0; i < max_size; ++i)
-    {
-      offset_t count = 0,
-               offset;
-      if (i < size)
-      {
-        // get the block from master and serialize it
-        const void* block = master.get(i);
-        MemoryBuffer bb;
-        LinkFactory::save(bb, master.link(i));
-        save(block, bb);
-        count = bb.buffer.size();
-        mpi::scan(comm, count, offset, std::plus<offset_t>());
-        offset += start - count;
-        mpi::all_reduce(comm, count, shift, std::plus<offset_t>());
-        start += shift;
-
-        if (i < min_size)       // up to min_size, we can do collective IO
-          f.write_at_all(offset, bb.buffer);
-        else
-          f.write_at(offset, bb.buffer);
-
-        offset_counts.push_back(GidOffsetCount(master.gid(i), offset, count));
-      } else
-      {
-        // matching global operations
-        mpi::scan(comm, count, offset, std::plus<offset_t>());
-        mpi::all_reduce(comm, count, shift, std::plus<offset_t>());
-
-        // -1 indicates that there is no block written here from this rank
-        offset_counts.push_back(GidOffsetCount(-1, offset, count));
-      }
-    }
-
-    if (comm.rank() == 0)
-    {
-      // round-about way of gather vector of vectors of GidOffsetCount to avoid registering a new mpi datatype
-      std::vector< std::vector<char> > gathered_offset_count_buffers;
-      MemoryBuffer oc_buffer; diy::save(oc_buffer, offset_counts);
-      mpi::gather(comm, oc_buffer.buffer, gathered_offset_count_buffers, 0);
-
-      std::vector<GidOffsetCount>  all_offset_counts;
-      for (unsigned i = 0; i < gathered_offset_count_buffers.size(); ++i)
-      {
-        MemoryBuffer per_rank_oc_buffer; per_rank_oc_buffer.buffer.swap(gathered_offset_count_buffers[i]);
-        std::vector<GidOffsetCount> per_rank_offset_counts;
-        diy::load(per_rank_oc_buffer, per_rank_offset_counts);
-        for (unsigned j = 0; j < per_rank_offset_counts.size(); ++j)
-          if (per_rank_offset_counts[j].gid != -1)
-            all_offset_counts.push_back(per_rank_offset_counts[j]);
-      }
-      std::sort(all_offset_counts.begin(), all_offset_counts.end());        // sorts by gid
-
-      MemoryBuffer bb;
-      diy::save(bb, all_offset_counts);
-      diy::save(bb, extra);
-      size_t footer_size = bb.size();
-      diy::save(bb, footer_size);
-
-      // find footer_offset as the max of (offset + count)
-      offset_t footer_offset = 0;
-      for (unsigned i = 0; i < all_offset_counts.size(); ++i)
-      {
-        offset_t end = all_offset_counts[i].offset + all_offset_counts[i].count;
-        if (end > footer_offset)
-            footer_offset = end;
-      }
-      f.write_at(footer_offset, bb.buffer);
-    } else
-    {
-      MemoryBuffer oc_buffer; diy::save(oc_buffer, offset_counts);
-      mpi::gather(comm, oc_buffer.buffer, 0);
-    }
-  }
-
-/**
- * \ingroup IO
- * \brief Read blocks from storage collectively from one shared file
- */
-    inline
-    void
-    read_blocks(const std::string&           infilename,     //!< input file name
-                const mpi::communicator&     comm,           //!< communicator
-                StaticAssigner&              assigner,       //!< assigner object
-                Master&                      master,         //!< master object
-                MemoryBuffer&                extra,          //!< user-defined metadata in file header
-                Master::LoadBlock            load = 0)       //!< load block function in case different than or unefined in the master
-    {
-        if (!load) load = master.loader();      // load is likely to be different from master.load()
-
-        typedef detail::offset_t                offset_t;
-        typedef detail::GidOffsetCount          GidOffsetCount;
-
-        mpi::io::file f(comm, infilename, mpi::io::file::rdonly);
-
-        offset_t    footer_offset = f.size() - sizeof(size_t);
-        size_t footer_size;
-
-        // Read the size
-        f.read_at_all(footer_offset, (char*) &footer_size, sizeof(footer_size));
-
-        // Read all_offset_counts
-        footer_offset -= footer_size;
-        MemoryBuffer footer;
-        footer.buffer.resize(footer_size);
-        f.read_at_all(footer_offset, footer.buffer);
-
-        std::vector<GidOffsetCount>  all_offset_counts;
-        diy::load(footer, all_offset_counts);
-        diy::load(footer, extra);
-        extra.reset();
-
-        // Get local gids from assigner
-        size_t size = all_offset_counts.size();
-        assigner.set_nblocks(size);
-        std::vector<int> gids;
-        assigner.local_gids(comm.rank(), gids);
-
-        for (unsigned i = 0; i < gids.size(); ++i)
-        {
-            if (gids[i] != all_offset_counts[gids[i]].gid)
-                get_logger()->warn("gids don't match in diy::io::read_blocks(), {} vs {}",
-                                   gids[i], all_offset_counts[gids[i]].gid);
-
-            offset_t offset = all_offset_counts[gids[i]].offset,
-                     count  = all_offset_counts[gids[i]].count;
-            MemoryBuffer bb;
-            bb.buffer.resize(count);
-            f.read_at(offset, bb.buffer);
-            Link* l = LinkFactory::load(bb);
-            l->fix(assigner);
-            void* b = master.create();
-            load(b, bb);
-            master.add(gids[i], b, l);
-        }
-    }
-
-
-  // Functions without the extra buffer, for compatibility with the old code
-  inline
-  void
-  write_blocks(const std::string&           outfilename,
-               const mpi::communicator&     comm,
-               Master&                      master,
-               Master::SaveBlock            save)
-  {
-    MemoryBuffer extra;
-    write_blocks(outfilename, comm, master, extra, save);
-  }
-
-  inline
-  void
-  read_blocks(const std::string&           infilename,
-              const mpi::communicator&     comm,
-              StaticAssigner&              assigner,
-              Master&                      master,
-              Master::LoadBlock            load = 0)
-  {
-    MemoryBuffer extra;     // dummy
-    read_blocks(infilename, comm, assigner, master, extra, load);
-  }
-
-namespace split
-{
-/**
- * \ingroup IO
- * \brief Write blocks to storage independently in one file per process
- */
-  inline
-  void
-  write_blocks(const std::string&           outfilename,           //!< output file name
-               const mpi::communicator&     comm,                  //!< communicator
-               Master&                      master,                //!< master object
-               const MemoryBuffer&          extra = MemoryBuffer(),//!< user-defined metadata for file header; meaningful only on rank == 0
-               Master::SaveBlock            save = 0)              //!< block save function in case different than or undefined in master
-  {
-    if (!save) save = master.saver();       // save is likely to be different from master.save()
-
-    bool proceed = false;
-    size_t size = 0;
-    if (comm.rank() == 0)
-    {
-        if (diy::io::utils::is_directory(outfilename))
-          proceed = true;
-        else if (diy::io::utils::make_directory(outfilename))
-          proceed = true;
-
-        mpi::broadcast(comm, proceed, 0);
-        mpi::reduce(comm, (size_t) master.size(), size, 0, std::plus<size_t>());
-    } else
-    {
-        mpi::broadcast(comm, proceed, 0);
-        mpi::reduce(comm, (size_t) master.size(), 0, std::plus<size_t>());
-    }
-
-    if (!proceed)
-        throw std::runtime_error("Cannot access or create directory: " + outfilename);
-
-    for (int i = 0; i < (int)master.size(); ++i)
-    {
-        const void* block = master.get(i);
-
-        std::string filename = fmt::format("{}/{}", outfilename, master.gid(i));
-
-        ::diy::detail::FileBuffer bb(fopen(filename.c_str(), "w"));
-
-        LinkFactory::save(bb, master.link(i));
-        save(block, bb);
-
-        fclose(bb.file);
-    }
-
-    if (comm.rank() == 0)
-    {
-        // save the extra buffer
-        std::string filename = outfilename + "/extra";
-        ::diy::detail::FileBuffer bb(fopen(filename.c_str(), "w"));
-        ::diy::save(bb, size);
-        ::diy::save(bb, extra);
-        fclose(bb.file);
-    }
-  }
-
-/**
- * \ingroup IO
- * \brief Read blocks from storage independently from one file per process
- */
-  inline
-  void
-  read_blocks(const std::string&           infilename,  //!< input file name
-              const mpi::communicator&     comm,        //!< communicator
-              StaticAssigner&              assigner,    //!< assigner object
-              Master&                      master,      //!< master object
-              MemoryBuffer&                extra,       //!< user-defined metadata in file header
-              Master::LoadBlock            load = 0)    //!< block load function in case different than or undefined in master
-  {
-    if (!load) load = master.loader();      // load is likely to be different from master.load()
-
-    // load the extra buffer and size
-    size_t          size;
-    {
-        std::string filename = infilename + "/extra";
-        ::diy::detail::FileBuffer bb(fopen(filename.c_str(), "r"));
-        ::diy::load(bb, size);
-        ::diy::load(bb, extra);
-        extra.reset();
-        fclose(bb.file);
-    }
-
-    // Get local gids from assigner
-    assigner.set_nblocks(size);
-    std::vector<int> gids;
-    assigner.local_gids(comm.rank(), gids);
-
-    // Read our blocks;
-    for (unsigned i = 0; i < gids.size(); ++i)
-    {
-        std::string filename = fmt::format("{}/{}", infilename, gids[i]);
-
-        ::diy::detail::FileBuffer bb(fopen(filename.c_str(), "r"));
-        Link* l = LinkFactory::load(bb);
-        l->fix(assigner);
-        void* b = master.create();
-        load(b, bb);
-        master.add(gids[i], b, l);
-
-        fclose(bb.file);
-    }
-  }
-
-  // Functions without the extra buffer, for compatibility with the old code
-  inline
-  void
-  write_blocks(const std::string&           outfilename,
-               const mpi::communicator&     comm,
-               Master&                      master,
-               Master::SaveBlock            save)
-  {
-    MemoryBuffer extra;
-    write_blocks(outfilename, comm, master, extra, save);
-  }
-
-  inline
-  void
-  read_blocks(const std::string&           infilename,
-              const mpi::communicator&     comm,
-              StaticAssigner&              assigner,
-              Master&                      master,
-              Master::LoadBlock            load = 0)
-  {
-    MemoryBuffer extra;     // dummy
-    read_blocks(infilename, comm, assigner, master, extra, load);
-  }
-} // split
-} // io
-} // diy
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/bov.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/bov.hpp
@ -1,181 +0,0 @@
-#ifndef VTKMDIY_IO_BOV_HPP
-#define VTKMDIY_IO_BOV_HPP
-
-#include <vector>
-#include <algorithm>
-#include <numeric>
-
-#include "../types.hpp"
-#include "../mpi.hpp"
-
-namespace diy
-{
-namespace io
-{
-  // Reads and writes subsets of a block of values into specified block bounds
-  class BOV
-  {
-    public:
-      typedef       std::vector<int>                                    Shape;
-    public:
-                    BOV(mpi::io::file&    f):
-                      f_(f), offset_(0)                                 {}
-
-      template<class S>
-                    BOV(mpi::io::file&    f,
-                        const S&          shape  = S(),
-                        mpi::io::offset   offset = 0):
-                      f_(f), offset_(offset)                            { set_shape(shape); }
-
-      void          set_offset(mpi::io::offset offset)                  { offset_ = offset; }
-
-      template<class S>
-      void          set_shape(const S& shape)
-      {
-        shape_.clear();
-        stride_.clear();
-        for (unsigned i = 0; i < shape.size(); ++i)
-        {
-            shape_.push_back(shape[i]);
-            stride_.push_back(1);
-        }
-        for (int i = shape_.size() - 2; i >=  0; --i)
-          stride_[i] = stride_[i+1] * shape_[i+1];
-      }
-
-      const Shape&  shape() const                                       { return shape_; }
-
-      template<class T>
-      void          read(const DiscreteBounds& bounds, T* buffer, bool collective = false, int chunk = 1) const;
-
-      template<class T>
-      void          write(const DiscreteBounds& bounds, const T* buffer, bool collective = false, int chunk = 1);
-
-      template<class T>
-      void          write(const DiscreteBounds& bounds, const T* buffer, const DiscreteBounds& core, bool collective = false, int chunk = 1);
-
-    protected:
-      mpi::io::file&        file()                                        { return f_; }
-
-    private:
-      mpi::io::file&        f_;
-      Shape                 shape_;
-      std::vector<size_t>   stride_;
-      size_t                offset_;
-  };
-}
-}
-
-template<class T>
-void
-diy::io::BOV::
-read(const DiscreteBounds& bounds, T* buffer, bool collective, int chunk) const
-{
-#ifndef VTKM_DIY_NO_MPI
-  int dim   = shape_.size();
-  int total = 1;
-  std::vector<int> subsizes;
-  for (int i = 0; i < dim; ++i)
-  {
-    subsizes.push_back(bounds.max[i] - bounds.min[i] + 1);
-    total *= subsizes.back();
-  }
-
-  MPI_Datatype T_type;
-  if (chunk == 1)
-    T_type = mpi::detail::get_mpi_datatype<T>();
-  else
-  {
-    // create an MPI struct of size chunk to read the data in those chunks
-    // (this allows to work around MPI-IO weirdness where crucial quantities
-    // are ints, which are too narrow of a type)
-    int             array_of_blocklengths[]  = { chunk };
-    MPI_Aint        array_of_displacements[] = { 0 };
-    MPI_Datatype    array_of_types[]         = { mpi::detail::get_mpi_datatype<T>() };
-    MPI_Type_create_struct(1, array_of_blocklengths, array_of_displacements, array_of_types, &T_type);
-    MPI_Type_commit(&T_type);
-  }
-
-  MPI_Datatype fileblk;
-  MPI_Type_create_subarray(dim, (int*) &shape_[0], &subsizes[0], (int*) &bounds.min[0], MPI_ORDER_C, T_type, &fileblk);
-  MPI_Type_commit(&fileblk);
-
-  MPI_File_set_view(f_.handle(), offset_, T_type, fileblk, (char*)"native", MPI_INFO_NULL);
-
-  mpi::status s;
-  if (!collective)
-      MPI_File_read(f_.handle(), buffer, total, T_type, &s.s);
-  else
-      MPI_File_read_all(f_.handle(), buffer, total, T_type, &s.s);
-
-  if (chunk != 1)
-    MPI_Type_free(&T_type);
-  MPI_Type_free(&fileblk);
-#else
-  (void) bounds; (void) buffer; (void) collective; (void)chunk;
-  DIY_UNSUPPORTED_MPI_CALL(diy::io::BOV::read);
-#endif
-}
-
-template<class T>
-void
-diy::io::BOV::
-write(const DiscreteBounds& bounds, const T* buffer, bool collective, int chunk)
-{
-    write(bounds, buffer, bounds, collective, chunk);
-}
-
-template<class T>
-void
-diy::io::BOV::
-write(const DiscreteBounds& bounds, const T* buffer, const DiscreteBounds& core, bool collective, int chunk)
-{
-#ifndef VTKM_DIY_NO_MPI
-  int dim   = shape_.size();
-  std::vector<int> subsizes;
-  std::vector<int> buffer_shape, buffer_start;
-  for (int i = 0; i < dim; ++i)
-  {
-    buffer_shape.push_back(bounds.max[i] - bounds.min[i] + 1);
-    buffer_start.push_back(core.min[i] - bounds.min[i]);
-    subsizes.push_back(core.max[i] - core.min[i] + 1);
-  }
-
-  MPI_Datatype T_type;
-  if (chunk == 1)
-    T_type = mpi::detail::get_mpi_datatype<T>();
-  else
-  {
-    // assume T is a binary block and create an MPI struct of appropriate size
-    int             array_of_blocklengths[]  = { chunk };
-    MPI_Aint        array_of_displacements[] = { 0 };
-    MPI_Datatype    array_of_types[]         = { mpi::detail::get_mpi_datatype<T>() };
-    MPI_Type_create_struct(1, array_of_blocklengths, array_of_displacements, array_of_types, &T_type);
-    MPI_Type_commit(&T_type);
-  }
-
-  MPI_Datatype fileblk, subbuffer;
-  MPI_Type_create_subarray(dim, (int*) &shape_[0],       &subsizes[0], (int*) &core.min[0],     MPI_ORDER_C, T_type, &fileblk);
-  MPI_Type_create_subarray(dim, (int*) &buffer_shape[0], &subsizes[0], (int*) &buffer_start[0], MPI_ORDER_C, T_type, &subbuffer);
-  MPI_Type_commit(&fileblk);
-  MPI_Type_commit(&subbuffer);
-
-  MPI_File_set_view(f_.handle(), offset_, T_type, fileblk, (char*)"native", MPI_INFO_NULL);
-
-  mpi::status s;
-  if (!collective)
-    MPI_File_write(f_.handle(), (void*)buffer, 1, subbuffer, &s.s);
-  else
-    MPI_File_write_all(f_.handle(), (void*)buffer, 1, subbuffer, &s.s);
-
-  if (chunk != 1)
-    MPI_Type_free(&T_type);
-  MPI_Type_free(&fileblk);
-  MPI_Type_free(&subbuffer);
-#else
-  (void) bounds; (void) buffer;(void) core; (void) collective; (void) chunk;
-  DIY_UNSUPPORTED_MPI_CALL(diy::io::bov::write);
-#endif
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/numpy.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/numpy.hpp
@ -1,213 +0,0 @@
-#ifndef VTKMDIY_IO_NMPY_HPP
-#define VTKMDIY_IO_NMPY_HPP
-
-#include <sstream>
-#include <complex>
-#include <stdexcept>
-
-#include "../serialization.hpp"
-#include "bov.hpp"
-
-namespace diy
-{
-namespace io
-{
-  class NumPy: public BOV
-  {
-    public:
-                        NumPy(mpi::io::file& f):
-                          BOV(f)                                {}
-
-      unsigned          word_size() const                       { return word_size_; }
-
-      unsigned          read_header()
-      {
-        BOV::Shape  shape;
-        bool        fortran;
-        size_t      offset = parse_npy_header(shape, fortran);
-        if (fortran)
-            throw std::runtime_error("diy::io::NumPy cannot read data in fortran order");
-        BOV::set_offset(offset);
-        BOV::set_shape(shape);
-        return word_size_;
-      }
-
-      template<class T>
-      void              write_header(int dim, const DiscreteBounds& bounds);
-
-      template<class T, class S>
-      void              write_header(const S& shape);
-
-    private:
-      inline size_t     parse_npy_header(BOV::Shape& shape, bool& fortran_order);
-      void              save(diy::BinaryBuffer& bb, const std::string& s)               { bb.save_binary(s.c_str(), s.size()); }
-      template<class T>
-      inline void       convert_and_save(diy::BinaryBuffer& bb, const T& x)
-      {
-          std::ostringstream oss;
-          oss << x;
-          save(bb, oss.str());
-      }
-
-    private:
-      unsigned          word_size_;
-  };
-
-  namespace detail
-  {
-    inline char big_endian();
-    template<class T>
-    char map_numpy_type();
-  }
-}
-}
-
-// Modified from: https://github.com/rogersce/cnpy
-// Copyright (C) 2011  Carl Rogers
-// Released under MIT License
-// license available at http://www.opensource.org/licenses/mit-license.php
-size_t
-diy::io::NumPy::
-parse_npy_header(BOV::Shape& shape, bool& fortran_order)
-{
-    char buffer[256];
-    file().read_at_all(0, buffer, 256);
-    std::string header(buffer, buffer + 256);
-    size_t nl = header.find('\n');
-    if (nl == std::string::npos)
-        throw std::runtime_error("parse_npy_header: failed to read the header");
-    header = header.substr(11, nl - 11 + 1);
-    size_t header_size = nl + 1;
-
-    int loc1, loc2;
-
-    //fortran order
-    loc1 = header.find("fortran_order")+16;
-    fortran_order = (header.substr(loc1,4) == "True" ? true : false);
-
-    //shape
-    unsigned ndims;
-    loc1 = header.find("(");
-    loc2 = header.find(")");
-    std::string str_shape = header.substr(loc1+1,loc2-loc1-1);
-    if(str_shape[str_shape.size()-1] == ',') ndims = 1;
-    else ndims = std::count(str_shape.begin(),str_shape.end(),',')+1;
-    shape.resize(ndims);
-    for(unsigned int i = 0;i < ndims;i++) {
-        loc1 = str_shape.find(",");
-        shape[i] = atoi(str_shape.substr(0,loc1).c_str());
-        str_shape = str_shape.substr(loc1+1);
-    }
-
-    //endian, word size, data type
-    //byte order code | stands for not applicable.
-    //not sure when this applies except for byte array
-    loc1 = header.find("descr")+9;
-    //bool littleEndian = (header[loc1] == '<' || header[loc1] == '|' ? true : false);
-    //assert(littleEndian);
-
-    //char type = header[loc1+1];
-    //assert(type == map_type(T));
-
-    std::string str_ws = header.substr(loc1+2);
-    loc2 = str_ws.find("'");
-    word_size_ = atoi(str_ws.substr(0,loc2).c_str());
-
-    return header_size;
-}
-
-template<class T>
-void
-diy::io::NumPy::
-write_header(int dim, const DiscreteBounds& bounds)
-{
-    std::vector<int> shape;
-    for (int i = 0; i < dim; ++i)
-        shape.push_back(bounds.max[i] - bounds.min[i] + 1);
-
-    write_header< T, std::vector<int> >(shape);
-}
-
-
-template<class T, class S>
-void
-diy::io::NumPy::
-write_header(const S& shape)
-{
-    BOV::set_shape(shape);
-
-    diy::MemoryBuffer dict;
-    save(dict, "{'descr': '");
-    diy::save(dict, detail::big_endian());
-    diy::save(dict, detail::map_numpy_type<T>());
-    convert_and_save(dict, sizeof(T));
-    save(dict, "', 'fortran_order': False, 'shape': (");
-    convert_and_save(dict, shape[0]);
-    for (int i = 1; i < (int) shape.size(); i++)
-    {
-        save(dict, ", ");
-        convert_and_save(dict, shape[i]);
-    }
-    if(shape.size() == 1) save(dict, ",");
-    save(dict, "), }");
-    //pad with spaces so that preamble+dict is modulo 16 bytes. preamble is 10 bytes. dict needs to end with \n
-    int remainder = 16 - (10 + dict.position) % 16;
-    for (int i = 0; i < remainder - 1; ++i)
-        diy::save(dict, ' ');
-    diy::save(dict, '\n');
-
-    diy::MemoryBuffer header;
-    diy::save(header, (char) 0x93);
-    save(header, "NUMPY");
-    diy::save(header, (char) 0x01);  // major version of numpy format
-    diy::save(header, (char) 0x00);  // minor version of numpy format
-    diy::save(header, (unsigned short) dict.position);
-    header.save_binary(&dict.buffer[0], dict.buffer.size());
-
-    BOV::set_offset(header.position);
-
-    if (file().comm().rank() == 0)
-        file().write_at(0, &header.buffer[0], header.buffer.size());
-}
-
-char
-diy::io::detail::big_endian()
-{
-  unsigned char x[] = {1,0};
-  void* x_void = x;
-  short y = *static_cast<short*>(x_void);
-  return y == 1 ? '<' : '>';
-}
-
-namespace diy
-{
-namespace io
-{
-namespace detail
-{
-template<> inline char map_numpy_type<float>()                         { return 'f'; }
-template<> inline char map_numpy_type<double>()                        { return 'f'; }
-template<> inline char map_numpy_type<long double>()                   { return 'f'; }
-
-template<> inline char map_numpy_type<int>()                           { return 'i'; }
-template<> inline char map_numpy_type<char>()                          { return 'i'; }
-template<> inline char map_numpy_type<short>()                         { return 'i'; }
-template<> inline char map_numpy_type<long>()                          { return 'i'; }
-template<> inline char map_numpy_type<long long>()                     { return 'i'; }
-
-template<> inline char map_numpy_type<unsigned int>()                  { return 'u'; }
-template<> inline char map_numpy_type<unsigned char>()                 { return 'u'; }
-template<> inline char map_numpy_type<unsigned short>()                { return 'u'; }
-template<> inline char map_numpy_type<unsigned long>()                 { return 'u'; }
-template<> inline char map_numpy_type<unsigned long long>()            { return 'u'; }
-
-template<> inline char map_numpy_type<bool>()                          { return 'b'; }
-
-template<> inline char map_numpy_type< std::complex<float> >()         { return 'c'; }
-template<> inline char map_numpy_type< std::complex<double> >()        { return 'c'; }
-template<> inline char map_numpy_type< std::complex<long double> >()   { return 'c'; }
-}
-}
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/shared.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/shared.hpp
@ -1,49 +0,0 @@
-#ifndef VTKMDIY_IO_SHARED_HPP
-#define VTKMDIY_IO_SHARED_HPP
-
-#include <sstream>
-#include <fstream>
-#include "../mpi.hpp"
-
-namespace diy
-{
-namespace io
-{
-
-class SharedOutFile: public std::ostringstream
-{
-    public:
-                SharedOutFile(std::string filename, diy::mpi::communicator world, int root = 0):
-                    filename_(filename),
-                    world_(world),
-                    root_(root)                     {}
-
-                ~SharedOutFile()                    { close(); }
-
-        void    close()
-        {
-            auto str = this->str();
-            std::vector<char> contents(str.begin(), str.end());
-            if (world_.rank() == root_)
-            {
-                std::vector<std::vector<char>> all_contents;
-                diy::mpi::gather(world_, contents, all_contents, root_);
-
-                // write the file serially
-                std::ofstream out(filename_);
-                for (auto& contents : all_contents)
-                    out.write(contents.data(), contents.size());
-            } else
-                diy::mpi::gather(world_, contents, root_);
-        }
-
-    private:
-        std::string             filename_;
-        diy::mpi::communicator  world_;
-        int                     root_;
-};
-
-}
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/utils.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/io/utils.hpp
@ -1,162 +0,0 @@
-#ifndef VTKMDIY_IO_UTILS_HPP
-#define VTKMDIY_IO_UTILS_HPP
-
-#if defined(_WIN32)
-#include <direct.h>
-#include <io.h>
-#include <share.h>
-#else
-#include <unistd.h>     // mkstemp() on Mac
-#include <dirent.h>
-#endif
-
-#include <cstdio>       // remove()
-#include <cstdlib>      // mkstemp() on Linux
-#include <sys/stat.h>
-
-#include "../constants.h" // for DIY_UNUSED
-
-namespace diy
-{
-namespace io
-{
-namespace utils
-{
-  namespace detail
-  {
-    // internal method to split a filename into path and fullname.
-    inline void splitpath(const std::string& fullname, std::string& path, std::string& name)
-    {
-      auto pos = fullname.rfind('/');
-      if (pos != std::string::npos)
-      {
-        path = fullname.substr(0, pos);
-        name = fullname.substr(pos+1);
-      }
-      else
-      {
-        path = ".";
-        name = fullname;
-      }
-    }
-  } // namespace detail
-
-  /**
-   * returns true if the filename exists and refers to a directory.
-   */
-  inline bool is_directory(const std::string& filename)
-  {
-#if defined(_WIN32)
-    DWORD attr = GetFileAttributes(filename.c_str());
-    return (attr != INVALID_FILE_ATTRIBUTES) && ((attr & FILE_ATTRIBUTE_DIRECTORY) != 0);
-#else
-    struct stat s;
-    return (stat(filename.c_str(), &s) == 0 && S_ISDIR(s.st_mode));
-#endif
-  }
-
-  /**
-   * creates a new directory. returns true on success.
-   */
-  inline bool make_directory(const std::string& filename)
-  {
-#if defined(_WIN32)
-    return _mkdir(filename.c_str()) == 0;
-#else
-    return mkdir(filename.c_str(), 0755) == 0;
-#endif
-  }
-
-  /**
-   * truncates a file to the given length, if the file exists and can be opened
-   * for writing.
-   */
-  inline void truncate(const std::string& filename, size_t length)
-  {
-#if defined(_WIN32)
-    int fd = -1;
-    _sopen_s(&fd, filename.c_str(), _O_WRONLY | _O_BINARY, _SH_DENYNO, _S_IWRITE);
-    if (fd != -1)
-    {
-      _chsize_s(fd, static_cast<__int64>(length));
-      _close(fd);
-    }
-#else
-    auto r = ::truncate(filename.c_str(), static_cast<off_t>(length));
-    (void) r;
-#endif
-  }
-
-  inline int mkstemp(std::string& filename)
-  {
-#if defined(_WIN32)
-
-    std::string path, name;
-    detail::splitpath(filename, path, name);
-    char temppath[MAX_PATH];
-    // note: GetTempFileName only uses the first 3 chars of the prefix (name)
-    // specified.
-    if (GetTempFileName(path.c_str(), name.c_str(), 0, temppath) == 0)
-    {
-      // failed! return invalid handle.
-      return -1;
-    }
-    int handle = -1;
-    // _sopen_s sets handle to -1 on error.
-    _sopen_s(&handle, temppath, _O_WRONLY | _O_CREAT | _O_BINARY, _SH_DENYNO, _S_IWRITE);
-    if (handle != -1)
-      filename = temppath;
-    return handle;
-
-#else // defined(_WIN32)
-
-    std::unique_ptr<char[]> s_template(new char[filename.size() + 1]);
-    std::copy(filename.begin(), filename.end(), s_template.get());
-    s_template[filename.size()] = 0;
-
-    int handle = -1;
-#if defined(__MACH__)
-    // TODO: figure out how to open with O_SYNC
-    handle = ::mkstemp(s_template.get());
-#else
-    handle = mkostemp(s_template.get(), O_WRONLY | O_SYNC);
-#endif
-    if (handle != -1)
-      filename = s_template.get();
-    return handle;
-
-#endif // defined(_WIN32)
-  }
-
-  inline void close(int fd)
-  {
-#if defined(_WIN32)
-    _close(fd);
-#else
-    fsync(fd);
-    ::close(fd);
-#endif
-  }
-
-  inline void sync(int fd)
-  {
-#if defined(_WIN32)
-    DIY_UNUSED(fd);
-#else
-    fsync(fd);
-#endif
-  }
-
-  inline bool remove(const std::string& filename)
-  {
-#if defined(_WIN32)
-    return DeleteFile(filename.c_str()) != 0;
-#else
-    return ::remove(filename.c_str()) == 0;
-#endif
-  }
-}
-}
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/link.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/link.hpp
@ -1,226 +0,0 @@
-#ifndef VTKMDIY_COVER_HPP
-#define VTKMDIY_COVER_HPP
-
-#include <vector>
-#include <map>
-#include <algorithm>
-
-#include "types.hpp"
-#include "serialization.hpp"
-#include "assigner.hpp"
-
-namespace diy
-{
-  // Local view of a distributed representation of a cover, a completely unstructured link
-  class Link
-  {
-    public:
-      using Neighbors = std::vector<BlockID>;
-
-      virtual   ~Link()                             {}  // need to be able to delete derived classes
-
-      int       size() const                        { return static_cast<int>(neighbors_.size()); }
-      inline
-      int       size_unique() const;
-      BlockID   target(int i) const                 { return neighbors_[static_cast<size_t>(i)]; }
-      BlockID&  target(int i)                       { return neighbors_[static_cast<size_t>(i)]; }
-      inline
-      int       find(int gid) const;
-
-      void      add_neighbor(const BlockID& block)  { neighbors_.push_back(block); }
-
-      void      fix(const Assigner& assigner)       { for (unsigned i = 0; i < neighbors_.size(); ++i) { neighbors_[i].proc = assigner.rank(neighbors_[i].gid); } }
-
-      void      swap(Link& other)                   { neighbors_.swap(other.neighbors_); }
-
-      const Neighbors&
-                neighbors() const                   { return neighbors_; }
-      Neighbors&
-                neighbors()                         { return neighbors_; }
-
-      virtual void  save(BinaryBuffer& bb) const    { diy::save(bb, neighbors_); }
-      virtual void  load(BinaryBuffer& bb)          { diy::load(bb, neighbors_); }
-
-      virtual size_t id() const                     { return 0; }
-
-    private:
-      Neighbors neighbors_;
-  };
-
-  template<class Bounds_>
-  class RegularLink;
-
-  typedef       RegularLink<DiscreteBounds>         RegularGridLink;
-  typedef       RegularLink<ContinuousBounds>       RegularContinuousLink;
-
-  // Selector between regular discrete and contious links given bounds type
-  template<class Bounds_>
-  struct RegularLinkSelector;
-
-  template<>
-  struct RegularLinkSelector<DiscreteBounds>
-  {
-    typedef     RegularGridLink         type;
-    static const size_t id = 1;
-  };
-
-  template<>
-  struct RegularLinkSelector<ContinuousBounds>
-  {
-    typedef     RegularContinuousLink   type;
-    static const size_t id = 2;
-  };
-
-
-  // for a regular decomposition, it makes sense to address the neighbors by direction
-  // and store local and neighbor bounds
-  template<class Bounds_>
-  class RegularLink: public Link
-  {
-    public:
-      typedef   Bounds_                             Bounds;
-
-      typedef   std::map<Direction, int>            DirMap;
-      typedef   std::vector<Direction>              DirVec;
-
-    public:
-                RegularLink(int dim, const Bounds& core__, const Bounds& bounds__):
-                  dim_(dim), core_(core__), bounds_(bounds__) {}
-
-      // dimension
-      int       dimension() const                       { return dim_; }
-
-      // direction
-      int       direction(Direction dir) const;         // convert direction to a neighbor (-1 if no neighbor)
-      Direction direction(int i) const                  { return dir_vec_[i]; }
-      void      add_direction(Direction dir)            { int c = dir_map_.size(); dir_map_[dir] = c; dir_vec_.push_back(dir); }
-
-      // wrap
-      void       add_wrap(Direction dir)                { wrap_.push_back(dir); }
-      Direction  wrap(int i) const                      { return wrap_[i]; }
-      Direction& wrap(int i)                            { return wrap_[i]; }
-
-      // bounds
-      const Bounds& core() const                        { return core_; }
-      Bounds&       core()                              { return core_; }
-      const Bounds& bounds() const                      { return bounds_; }
-      Bounds&       bounds()                            { return bounds_; }
-      const Bounds& bounds(int i) const                 { return nbr_bounds_[i]; }
-      void          add_bounds(const Bounds& bounds__)  { nbr_bounds_.push_back(bounds__); }
-
-      void      swap(RegularLink& other)                { Link::swap(other); dir_map_.swap(other.dir_map_); dir_vec_.swap(other.dir_vec_); nbr_bounds_.swap(other.nbr_bounds_); std::swap(dim_, other.dim_); wrap_.swap(other.wrap_); std::swap(core_, other.core_); std::swap(bounds_, other.bounds_); }
-
-      void      save(BinaryBuffer& bb) const
-      {
-          Link::save(bb);
-          diy::save(bb, dim_);
-          diy::save(bb, dir_map_);
-          diy::save(bb, dir_vec_);
-          diy::save(bb, core_);
-          diy::save(bb, bounds_);
-          diy::save(bb, nbr_bounds_);
-          diy::save(bb, wrap_);
-      }
-
-      void      load(BinaryBuffer& bb)
-      {
-          Link::load(bb);
-          diy::load(bb, dim_);
-          diy::load(bb, dir_map_);
-          diy::load(bb, dir_vec_);
-          diy::load(bb, core_);
-          diy::load(bb, bounds_);
-          diy::load(bb, nbr_bounds_);
-          diy::load(bb, wrap_);
-      }
-
-      virtual size_t id() const                         { return RegularLinkSelector<Bounds>::id; }
-
-    private:
-      int       dim_;
-
-      DirMap    dir_map_;
-      DirVec    dir_vec_;
-
-      Bounds                    core_;
-      Bounds                    bounds_;
-      std::vector<Bounds>       nbr_bounds_;
-      std::vector<Direction>    wrap_;
-  };
-
-  // Other cover candidates: KDTreeLink, AMRGridLink
-
-  struct LinkFactory
-  {
-    public:
-      static Link*          create(size_t id)
-      {
-          // not pretty, but will do for now
-          if (id == 0)
-            return new Link;
-          else if (id == 1)
-            return new RegularGridLink(0, DiscreteBounds(), DiscreteBounds());
-          else if (id == 2)
-            return new RegularContinuousLink(0, ContinuousBounds(), ContinuousBounds());
-          else
-            return 0;
-      }
-
-      inline static void    save(BinaryBuffer& bb, const Link* l);
-      inline static Link*   load(BinaryBuffer& bb);
-  };
-}
-
-
-void
-diy::LinkFactory::
-save(BinaryBuffer& bb, const Link* l)
-{
-    diy::save(bb, l->id());
-    l->save(bb);
-}
-
-diy::Link*
-diy::LinkFactory::
-load(BinaryBuffer& bb)
-{
-    size_t id;
-    diy::load(bb, id);
-    Link* l = create(id);
-    l->load(bb);
-    return l;
-}
-
-int
-diy::Link::
-find(int gid) const
-{
-    for (int i = 0; i < size(); ++i)
-  {
-    if (target(i).gid == gid)
-      return i;
-  }
-  return -1;
-}
-int
-diy::Link::
-size_unique() const
-{
-    std::vector<BlockID> tmp(neighbors_.begin(), neighbors_.end());
-    std::sort(tmp.begin(), tmp.end());
-    return static_cast<int>(std::unique(tmp.begin(), tmp.end()) - tmp.begin());
-}
-
-template<class Bounds>
-int
-diy::RegularLink<Bounds>::
-direction(Direction dir) const
-{
-  DirMap::const_iterator it = dir_map_.find(dir);
-  if (it == dir_map_.end())
-    return -1;
-  else
-    return it->second;
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/log.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/log.hpp
@ -1,103 +0,0 @@
-#ifndef VTKMDIY_LOG_HPP
-#define VTKMDIY_LOG_HPP
-
-#ifndef VTKMDIY_USE_SPDLOG
-
-#include <memory>
-#include "fmt/format.h"
-#include "fmt/ostream.h"
-
-namespace diy
-{
-
-namespace spd
-{
-    struct logger
-    {
-        // logger.info(cppformat_string, arg1, arg2, arg3, ...) call style
-        template <typename... Args> void trace(const char*, const Args&...)    {}
-        template <typename... Args> void debug(const char*, const Args&...)    {}
-        template <typename... Args> void info(const char*, const Args&...)     {}
-        template <typename... Args> void warn(const char*, const Args&...)     {}
-        template <typename... Args> void error(const char*, const Args&...)    {}
-        template <typename... Args> void critical(const char*, const Args&...) {}
-    };
-}
-
-inline
-std::shared_ptr<spd::logger>
-get_logger()
-{
-    return std::make_shared<spd::logger>();
-}
-
-inline
-std::shared_ptr<spd::logger>
-create_logger(std::string)
-{
-    return std::make_shared<spd::logger>();
-}
-
-template<class... Args>
-std::shared_ptr<spd::logger>
-set_logger(Args...)
-{
-    return std::make_shared<spd::logger>();
-}
-
-}   // diy
-
-#else // DIY_USE_SPDLOG
-
-#include <string>
-
-#include <spdlog/spdlog.h>
-#include <spdlog/sinks/null_sink.h>
-
-#include <spdlog/fmt/bundled/format.h>
-#include <spdlog/fmt/bundled/ostream.h>
-
-namespace diy
-{
-
-namespace spd = ::spdlog;
-
-inline
-std::shared_ptr<spd::logger>
-get_logger()
-{
-    auto log = spd::get("diy");
-    if (!log)
-    {
-        auto null_sink = std::make_shared<spd::sinks::null_sink_mt> ();
-        log = std::make_shared<spd::logger>("null_logger", null_sink);
-    }
-    return log;
-}
-
-inline
-std::shared_ptr<spd::logger>
-create_logger(std::string log_level)
-{
-    auto log = spd::stderr_logger_mt("diy");
-    int lvl;
-    for (lvl = spd::level::trace; lvl < spd::level::off; ++lvl)
-        if (spd::level::level_names[lvl] == log_level)
-            break;
-    log->set_level(static_cast<spd::level::level_enum>(lvl));
-    return log;
-}
-
-template<class... Args>
-std::shared_ptr<spd::logger>
-set_logger(Args... args)
-{
-    auto log = std::make_shared<spdlog::logger>("diy", args...);
-    return log;
-}
-
-}   // diy
-#endif
-
-
-#endif // DIY_LOG_HPP
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/master.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/master.hpp
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi.hpp
@ -1,71 +0,0 @@
-#ifndef VTKMDIY_MPI_HPP
-#define VTKMDIY_MPI_HPP
-
-#ifndef VTKM_DIY_NO_MPI
-#include <mpi.h>
-#else
-#include "mpi/no-mpi.hpp"
-#endif
-
-#include "mpi/constants.hpp"
-#include "mpi/datatypes.hpp"
-#include "mpi/optional.hpp"
-#include "mpi/status.hpp"
-#include "mpi/request.hpp"
-#include "mpi/point-to-point.hpp"
-#include "mpi/communicator.hpp"
-#include "mpi/collectives.hpp"
-#include "mpi/io.hpp"
-#include "mpi/window.hpp"
-
-namespace diy
-{
-namespace mpi
-{
-
-//! \ingroup MPI
-struct environment
-{
-  inline environment(int threading = MPI_THREAD_FUNNELED);
-  inline environment(int argc, char* argv[], int threading = MPI_THREAD_FUNNELED);
-  inline ~environment();
-
-  int   threading() const           { return provided_threading; }
-
-  int   provided_threading;
-};
-
-}
-}
-
-diy::mpi::environment::
-environment(int threading)
-{
-#ifndef VTKM_DIY_NO_MPI
-  int argc = 0; char** argv;
-  MPI_Init_thread(&argc, &argv, threading, &provided_threading);
-#else
-  provided_threading = threading;
-#endif
-}
-
-diy::mpi::environment::
-environment(int argc, char* argv[], int threading)
-{
-#ifndef VTKM_DIY_NO_MPI
-  MPI_Init_thread(&argc, &argv, threading, &provided_threading);
-#else
-  (void) argc; (void) argv;
-  provided_threading = threading;
-#endif
-}
-
-diy::mpi::environment::
-~environment()
-{
-#ifndef VTKM_DIY_NO_MPI
-  MPI_Finalize();
-#endif
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/collectives.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/collectives.hpp
@ -1,380 +0,0 @@
-#include <vector>
-
-#include "../constants.h" // for DIY_UNUSED.
-#include "operations.hpp"
-
-namespace diy
-{
-namespace mpi
-{
-  //!\addtogroup MPI
-  //!@{
-
-  template<class T, class Op>
-  struct Collectives
-  {
-    static void broadcast(const communicator& comm, T& x, int root)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      MPI_Bcast(address(x), count(x), datatype(x), root, comm);
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(x);
-      DIY_UNUSED(root);
-#endif
-    }
-
-    static void broadcast(const communicator& comm, std::vector<T>& x, int root)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      size_t sz = x.size();
-      Collectives<size_t, void*>::broadcast(comm, sz, root);
-
-      if (comm.rank() != root)
-          x.resize(sz);
-
-      MPI_Bcast(address(x), count(x), datatype(x), root, comm);
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(x);
-      DIY_UNUSED(root);
-#endif
-    }
-
-    static request ibroadcast(const communicator& comm, T& x, int root)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      request r;
-      MPI_Ibcast(address(x), count(x), datatype(x), root, comm, &r.r);
-      return r;
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(x);
-      DIY_UNUSED(root);
-      DIY_UNSUPPORTED_MPI_CALL(MPI_Ibcast);
-#endif
-    }
-
-    static void gather(const communicator& comm, const T& in, std::vector<T>& out, int root)
-    {
-      out.resize(comm.size());
-#ifndef VTKM_DIY_NO_MPI
-      MPI_Gather(address(in), count(in), datatype(in), address(out), count(in), datatype(out), root, comm);
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(root);
-      out[0] = in;
-#endif
-    }
-
-    static void gather(const communicator& comm, const std::vector<T>& in, std::vector< std::vector<T> >& out, int root)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      std::vector<int>  counts(comm.size());
-      Collectives<int,void*>::gather(comm, count(in), counts, root);
-
-      std::vector<int>  offsets(comm.size(), 0);
-      for (unsigned i = 1; i < offsets.size(); ++i)
-        offsets[i] = offsets[i-1] + counts[i-1];
-
-      int elem_size = count(in[0]);     // size of 1 vector element in units of mpi datatype
-      std::vector<T> buffer((offsets.back() + counts.back()) / elem_size);
-      MPI_Gatherv(address(in), count(in), datatype(in),
-                  address(buffer),
-                  &counts[0],
-                  &offsets[0],
-                  datatype(buffer),
-                  root, comm);
-
-      out.resize(comm.size());
-      size_t cur = 0;
-      for (unsigned i = 0; i < (unsigned)comm.size(); ++i)
-      {
-          out[i].reserve(counts[i] / elem_size);
-          for (unsigned j = 0; j < (unsigned)(counts[i] / elem_size); ++j)
-              out[i].push_back(buffer[cur++]);
-      }
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(root);
-      out.resize(1);
-      out[0] = in;
-#endif
-    }
-
-    static void gather(const communicator& comm, const T& in, int root)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      MPI_Gather(address(in), count(in), datatype(in), address(in), count(in), datatype(in), root, comm);
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(in);
-      DIY_UNUSED(root);
-      DIY_UNSUPPORTED_MPI_CALL("MPI_Gather");
-#endif
-    }
-
-    static void gather(const communicator& comm, const std::vector<T>& in, int root)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      Collectives<int,void*>::gather(comm, count(in), root);
-
-      MPI_Gatherv(address(in), count(in), datatype(in),
-                  0, 0, 0,
-                  datatype(in),
-                  root, comm);
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(in);
-      DIY_UNUSED(root);
-      DIY_UNSUPPORTED_MPI_CALL("MPI_Gatherv");
-#endif
-    }
-
-    static void all_gather(const communicator& comm, const T& in, std::vector<T>& out)
-    {
-      out.resize(comm.size());
-#ifndef VTKM_DIY_NO_MPI
-      MPI_Allgather(address(in), count(in), datatype(in),
-                    address(out), count(in), datatype(in),
-                    comm);
-#else
-      DIY_UNUSED(comm);
-      out[0] = in;
-#endif
-    }
-
-    static void all_gather(const communicator& comm, const std::vector<T>& in, std::vector< std::vector<T> >& out)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      std::vector<int>  counts(comm.size());
-      Collectives<int,void*>::all_gather(comm, count(in), counts);
-
-      std::vector<int>  offsets(comm.size(), 0);
-      for (unsigned i = 1; i < offsets.size(); ++i)
-        offsets[i] = offsets[i-1] + counts[i-1];
-
-      int elem_size = count(in[0]);     // size of 1 vector element in units of mpi datatype
-      std::vector<T> buffer((offsets.back() + counts.back()) / elem_size);
-      MPI_Allgatherv(address(in), count(in), datatype(in),
-                     address(buffer),
-                     &counts[0],
-                     &offsets[0],
-                     datatype(buffer),
-                     comm);
-
-      out.resize(comm.size());
-      size_t cur = 0;
-      for (int i = 0; i < comm.size(); ++i)
-      {
-          out[i].reserve(counts[i] / elem_size);
-          for (int j = 0; j < (int)(counts[i] / elem_size); ++j)
-              out[i].push_back(buffer[cur++]);
-      }
-#else
-      DIY_UNUSED(comm);
-      out.resize(1);
-      out[0] = in;
-#endif
-    }
-
-    static void reduce(const communicator& comm, const T& in, T& out, int root, const Op&)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      MPI_Reduce(address(in), address(out), count(in), datatype(in),
-                 detail::mpi_op<Op>::get(),
-                 root, comm);
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(root);
-      out = in;
-#endif
-    }
-
-    static void reduce(const communicator& comm, const T& in, int root, const Op&)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      MPI_Reduce(address(in), address(in), count(in), datatype(in),
-                 detail::mpi_op<Op>::get(),
-                 root, comm);
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(in);
-      DIY_UNUSED(root);
-      DIY_UNSUPPORTED_MPI_CALL("MPI_Reduce");
-#endif
-    }
-
-    static void all_reduce(const communicator& comm, const T& in, T& out, const Op&)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      MPI_Allreduce(address(in), address(out), count(in), datatype(in),
-                    detail::mpi_op<Op>::get(),
-                    comm);
-#else
-      DIY_UNUSED(comm);
-      out = in;
-#endif
-    }
-
-    static void all_reduce(const communicator& comm, const std::vector<T>& in, std::vector<T>& out, const Op&)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      out.resize(in.size());
-      MPI_Allreduce(address(in), address(out), count(in),
-                    datatype(in),
-                    detail::mpi_op<Op>::get(),
-                    comm);
-#else
-      DIY_UNUSED(comm);
-      out = in;
-#endif
-    }
-
-    static void scan(const communicator& comm, const T& in, T& out, const Op&)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      MPI_Scan(address(in), address(out), count(in), datatype(in),
-               detail::mpi_op<Op>::get(),
-               comm);
-#else
-      DIY_UNUSED(comm);
-      out = in;
-#endif
-    }
-
-    static void all_to_all(const communicator& comm, const std::vector<T>& in, std::vector<T>& out, int n = 1)
-    {
-#ifndef VTKM_DIY_NO_MPI
-      // n specifies how many elements go to/from every process from every process;
-      // the sizes of in and out are expected to be n * comm.size()
-
-      int elem_size = count(in[0]);               // size of 1 vector element in units of mpi datatype
-      // NB: this will fail if T is a vector
-      MPI_Alltoall(address(in),
-                   elem_size * n,
-                   datatype(in),
-                   address(out),
-                   elem_size * n,
-                   datatype(out),
-                   comm);
-#else
-      DIY_UNUSED(comm);
-      DIY_UNUSED(n);
-      out = in;
-#endif
-    }
-  };
-
-  //! Broadcast to all processes in `comm`.
-  template<class T>
-  void      broadcast(const communicator& comm, T& x, int root)
-  {
-    Collectives<T,void*>::broadcast(comm, x, root);
-  }
-
-  //! Broadcast for vectors
-  template<class T>
-  void      broadcast(const communicator& comm, std::vector<T>& x, int root)
-  {
-    Collectives<T,void*>::broadcast(comm, x, root);
-  }
-
-  //! iBroadcast to all processes in `comm`.
-  template<class T>
-  request   ibroadcast(const communicator& comm, T& x, int root)
-  {
-    return Collectives<T,void*>::ibroadcast(comm, x, root);
-  }
-
-  //! Gather from all processes in `comm`.
-  //!  On `root` process, `out` is resized to `comm.size()` and filled with
-  //! elements from the respective ranks.
-  template<class T>
-  void      gather(const communicator& comm, const T& in, std::vector<T>& out, int root)
-  {
-    Collectives<T,void*>::gather(comm, in, out, root);
-  }
-
-  //! Same as above, but for vectors.
-  template<class T>
-  void      gather(const communicator& comm, const std::vector<T>& in, std::vector< std::vector<T> >& out, int root)
-  {
-    Collectives<T,void*>::gather(comm, in, out, root);
-  }
-
-  //! Simplified version (without `out`) for use on non-root processes.
-  template<class T>
-  void      gather(const communicator& comm, const T& in, int root)
-  {
-    Collectives<T,void*>::gather(comm, in, root);
-  }
-
-  //! Simplified version (without `out`) for use on non-root processes.
-  template<class T>
-  void      gather(const communicator& comm, const std::vector<T>& in, int root)
-  {
-    Collectives<T,void*>::gather(comm, in, root);
-  }
-
-  //! all_gather from all processes in `comm`.
-  //! `out` is resized to `comm.size()` and filled with
-  //! elements from the respective ranks.
-  template<class T>
-  void      all_gather(const communicator& comm, const T& in, std::vector<T>& out)
-  {
-    Collectives<T,void*>::all_gather(comm, in, out);
-  }
-
-  //! Same as above, but for vectors.
-  template<class T>
-  void      all_gather(const communicator& comm, const std::vector<T>& in, std::vector< std::vector<T> >& out)
-  {
-    Collectives<T,void*>::all_gather(comm, in, out);
-  }
-
-  //! reduce
-  template<class T, class Op>
-  void      reduce(const communicator& comm, const T& in, T& out, int root, const Op& op)
-  {
-    Collectives<T, Op>::reduce(comm, in, out, root, op);
-  }
-
-  //! Simplified version (without `out`) for use on non-root processes.
-  template<class T, class Op>
-  void      reduce(const communicator& comm, const T& in, int root, const Op& op)
-  {
-    Collectives<T, Op>::reduce(comm, in, root, op);
-  }
-
-  //! all_reduce
-  template<class T, class Op>
-  void      all_reduce(const communicator& comm, const T& in, T& out, const Op& op)
-  {
-    Collectives<T, Op>::all_reduce(comm, in, out, op);
-  }
-
-  //! Same as above, but for vectors.
-  template<class T, class Op>
-  void      all_reduce(const communicator& comm, const std::vector<T>& in, std::vector<T>& out, const Op& op)
-  {
-    Collectives<T, Op>::all_reduce(comm, in, out, op);
-  }
-
-  //! scan
-  template<class T, class Op>
-  void      scan(const communicator& comm, const T& in, T& out, const Op& op)
-  {
-    Collectives<T, Op>::scan(comm, in, out, op);
-  }
-
-  //! all_to_all
-  template<class T>
-  void      all_to_all(const communicator& comm, const std::vector<T>& in, std::vector<T>& out, int n = 1)
-  {
-    Collectives<T, void*>::all_to_all(comm, in, out, n);
-  }
-
-  //!@}
-}
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/communicator.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/communicator.hpp
@ -1,227 +0,0 @@
-namespace diy
-{
-namespace mpi
-{
-
-  //! \ingroup MPI
-  //! Simple wrapper around `MPI_Comm`.
-  class communicator
-  {
-    public:
-                inline
-                communicator(MPI_Comm comm = MPI_COMM_WORLD, bool owner = false);
-
-                ~communicator()                     { destroy(); }
-
-                communicator(const communicator& other):
-                    comm_(other.comm_),
-                    rank_(other.rank_),
-                    size_(other.size_),
-                    owner_(false)                   {}
-
-                communicator(communicator&& other):
-                    comm_(other.comm_),
-                    rank_(other.rank_),
-                    size_(other.size_),
-                    owner_(other.owner_)                    { other.owner_ = false; }
-
-    communicator&
-                operator=(const communicator& other)        { destroy(); comm_ = other.comm_; rank_ = other.rank_; size_ = other.size_; owner_ = false; return *this; }
-    communicator&
-                operator=(communicator&& other)             { destroy(); comm_ = other.comm_; rank_ = other.rank_; size_ = other.size_; owner_ = other.owner_; other.owner_ = false; return *this; }
-
-      int       rank() const                        { return rank_; }
-      int       size() const                        { return size_; }
-
-      //! Send `x` to processor `dest` using `tag` (blocking).
-      template<class T>
-      void      send(int dest, int tag, const T& x) const   { detail::send<T>()(comm_, dest, tag, x); }
-
-      //! Receive `x` from `dest` using `tag` (blocking).
-      //! If `T` is an `std::vector<...>`, `recv` will resize it to fit exactly the sent number of values.
-      template <class T>
-      status recv(int source, int tag, T &x) const
-      {
-#if defined(VTKM_DIY_NO_MPI) && defined(__CUDACC_VER_MAJOR__) && __CUDACC_VER_MAJOR__ < 8 // CUDA 7.5 workaround
-        (void) source; (void)tag; (void)x;
-        DIY_UNSUPPORTED_MPI_CALL(MPI_Recv);
-#else
-        return detail::recv<T>{}(comm_, source, tag, x);
-#endif
-      }
-
-      //! Non-blocking version of `send()`.
-      template <class T>
-      request isend(int dest, int tag, const T &x) const
-      {
-#if defined(VTKM_DIY_NO_MPI) && defined(__CUDACC_VER_MAJOR__) && __CUDACC_VER_MAJOR__ < 8 // CUDA 7.5 workaround
-        (void) dest; (void)tag; (void)x;
-        DIY_UNSUPPORTED_MPI_CALL(MPI_Send);
-#else
-        return detail::isend<T>{}(comm_, dest, tag, x);
-#endif
-      }
-
-      //! Non-blocking version of `ssend()`.
-      template<class T>
-      request   issend(int dest, int tag, const T& x) const  { return detail::issend<T>()(comm_, dest, tag, x); }
-
-      //! Non-blocking version of `recv()`.
-      //! If `T` is an `std::vector<...>`, its size must be big enough to accommodate the sent values.
-      template <class T>
-      request irecv(int source, int tag, T &x) const
-      {
-#if defined(VTKM_DIY_NO_MPI) && defined(__CUDACC_VER_MAJOR__) && __CUDACC_VER_MAJOR__ < 8 // CUDA 7.5 workaround
-        (void)source; (void)tag; (void)x;
-        DIY_UNSUPPORTED_MPI_CALL(MPI_Irecv);
-#else
-        return detail::irecv<T>()(comm_, source, tag, x);
-#endif
-      }
-
-      //! probe
-      inline
-      status    probe(int source, int tag) const;
-
-      //! iprobe
-      inline
-      optional<status>
-                iprobe(int source, int tag) const;
-
-      //! barrier
-      inline
-      void      barrier() const;
-
-      //! Nonblocking version of barrier
-      inline
-      request   ibarrier() const;
-
-                operator MPI_Comm() const                   { return comm_; }
-
-      //! split
-      //! When keys are the same, the ties are broken by the rank in the original comm.
-      inline
-      communicator
-                split(int color, int key = 0) const;
-
-      //! duplicate
-      inline
-      void      duplicate(const communicator& other);
-
-    private:
-      inline
-      void      destroy();
-
-    private:
-      MPI_Comm  comm_;
-      int       rank_;
-      int       size_;
-      bool      owner_;
-  };
-}
-}
-
-diy::mpi::communicator::
-communicator(MPI_Comm comm, bool owner):
-    comm_(comm), rank_(0), size_(1), owner_(owner)
-{
-#ifndef VTKM_DIY_NO_MPI
-  if (comm != MPI_COMM_NULL)
-  {
-    MPI_Comm_rank(comm_, &rank_);
-    MPI_Comm_size(comm_, &size_);
-  }
-#endif
-}
-
-void
-diy::mpi::communicator::
-destroy()
-{
-#ifndef VTKM_DIY_NO_MPI
-    if (owner_)
-        MPI_Comm_free(&comm_);
-#endif
-}
-
-diy::mpi::status
-diy::mpi::communicator::
-probe(int source, int tag) const
-{
-  (void) source;
-  (void) tag;
-
-#ifndef VTKM_DIY_NO_MPI
-  status s;
-  MPI_Probe(source, tag, comm_, &s.s);
-  return s;
-#else
-  DIY_UNSUPPORTED_MPI_CALL(MPI_Probe);
-#endif
-}
-
-diy::mpi::optional<diy::mpi::status>
-diy::mpi::communicator::
-iprobe(int source, int tag) const
-{
-  (void) source;
-  (void) tag;
-#ifndef VTKM_DIY_NO_MPI
-  status s;
-  int flag;
-  MPI_Iprobe(source, tag, comm_, &flag, &s.s);
-  if (flag)
-    return s;
-#endif
-  return optional<status>();
-}
-
-void
-diy::mpi::communicator::
-barrier() const
-{
-#ifndef VTKM_DIY_NO_MPI
-  MPI_Barrier(comm_);
-#endif
-}
-
-diy::mpi::communicator
-diy::mpi::communicator::
-split(int color, int key) const
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Comm newcomm;
-    MPI_Comm_split(comm_, color, key, &newcomm);
-    return communicator(newcomm, true);
-#else
-    return communicator();
-#endif
-}
-
-diy::mpi::request
-diy::mpi::communicator::
-ibarrier() const
-{
-#ifndef VTKM_DIY_NO_MPI
-    request r_;
-    MPI_Ibarrier(comm_, &r_.r);
-    return r_;
-#else
-    // this is not the ideal fix; in principle we should just return a status
-    // that tests true, but this requires redesigning some parts of our no-mpi
-    // handling
-    DIY_UNSUPPORTED_MPI_CALL(MPI_Ibarrier);
-#endif
-}
-
-
-void
-diy::mpi::communicator::
-duplicate(const communicator& other)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Comm newcomm;
-    MPI_Comm_dup(other.comm_, &newcomm);
-    (*this) = communicator(newcomm,true);
-#endif
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/constants.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/constants.hpp
@ -1,13 +0,0 @@
-#ifndef VTKMDIY_MPI_CONSTANTS_HPP
-#define VTKMDIY_MPI_CONSTANTS_HPP
-
-namespace diy
-{
-namespace mpi
-{
-  const int any_source  = MPI_ANY_SOURCE;
-  const int any_tag     = MPI_ANY_TAG;
-}
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/datatypes.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/datatypes.hpp
@ -1,93 +0,0 @@
-#ifndef VTKMDIY_MPI_DATATYPES_HPP
-#define VTKMDIY_MPI_DATATYPES_HPP
-
-#include <vector>
-
-namespace diy
-{
-namespace mpi
-{
-namespace detail
-{
-  template<class T> MPI_Datatype  get_mpi_datatype();
-
-  struct true_type  {};
-  struct false_type {};
-
-  /* is_mpi_datatype */
-  template<class T>
-  struct is_mpi_datatype        { typedef false_type    type; };
-
-#define VTKMDIY_MPI_DATATYPE_MAP(cpp_type, mpi_type) \
-  template<>  inline MPI_Datatype  get_mpi_datatype<cpp_type>() { return mpi_type; }  \
-  template<>  struct is_mpi_datatype<cpp_type>                  { typedef true_type type; };    \
-  template<>  struct is_mpi_datatype< std::vector<cpp_type> >   { typedef true_type type; };
-
-  VTKMDIY_MPI_DATATYPE_MAP(char,                  MPI_BYTE);
-  VTKMDIY_MPI_DATATYPE_MAP(unsigned char,         MPI_BYTE);
-  VTKMDIY_MPI_DATATYPE_MAP(bool,                  MPI_BYTE);
-  VTKMDIY_MPI_DATATYPE_MAP(int,                   MPI_INT);
-  VTKMDIY_MPI_DATATYPE_MAP(unsigned,              MPI_UNSIGNED);
-  VTKMDIY_MPI_DATATYPE_MAP(long,                  MPI_LONG);
-  VTKMDIY_MPI_DATATYPE_MAP(unsigned long,         MPI_UNSIGNED_LONG);
-  VTKMDIY_MPI_DATATYPE_MAP(long long,             MPI_LONG_LONG_INT);
-  VTKMDIY_MPI_DATATYPE_MAP(unsigned long long,    MPI_UNSIGNED_LONG_LONG);
-  VTKMDIY_MPI_DATATYPE_MAP(float,                 MPI_FLOAT);
-  VTKMDIY_MPI_DATATYPE_MAP(double,                MPI_DOUBLE);
-
-  /* mpi_datatype: helper routines, specialized for std::vector<...> */
-  template<class T>
-  struct mpi_datatype
-  {
-    static MPI_Datatype         datatype()              { return get_mpi_datatype<T>(); }
-    static const void*          address(const T& x)     { return &x; }
-    static void*                address(T& x)           { return &x; }
-    static int                  count(const T&)         { return 1; }
-  };
-
-  template<class U>
-  struct mpi_datatype< std::vector<U> >
-  {
-    typedef     std::vector<U>      VecU;
-
-    static MPI_Datatype         datatype()              { return mpi_datatype<U>::datatype(); }
-    static const void*          address(const VecU& x)  { return x.data(); }
-    static void*                address(VecU& x)        { return x.data(); }
-    static int                  count(const VecU& x)    { return x.empty() ? 0 : (static_cast<int>(x.size()) * mpi_datatype<U>::count(x[0])); }
-  };
-} // detail
-
-template<class U>
-static MPI_Datatype datatype(const U&)
-{
-    using Datatype = detail::mpi_datatype<U>;
-    return Datatype::datatype();
-}
-
-template<class U>
-static void* address(const U& x)
-{
-    using Datatype = detail::mpi_datatype<U>;
-    return const_cast<void*>(Datatype::address(x));
-}
-
-template<class U>
-static void* address(U& x)
-{
-    using Datatype = detail::mpi_datatype<U>;
-    return Datatype::address(x);
-}
-
-template<class U>
-static int count(const U& x)
-{
-    using Datatype = detail::mpi_datatype<U>;
-    return Datatype::count(x);
-}
-
-
-
-} // mpi
-} // diy
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/io.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/io.hpp
@ -1,215 +0,0 @@
-#ifndef VTKMDIY_MPI_IO_HPP
-#define VTKMDIY_MPI_IO_HPP
-
-#include "../constants.h"
-
-#include <vector>
-#include <string>
-
-namespace diy
-{
-namespace mpi
-{
-namespace io
-{
-  typedef               MPI_Offset              offset;
-
-  //! Wraps MPI file IO. \ingroup MPI
-  class file
-  {
-    public:
-      enum
-      {
-        rdonly          = MPI_MODE_RDONLY,
-        rdwr            = MPI_MODE_RDWR,
-        wronly          = MPI_MODE_WRONLY,
-        create          = MPI_MODE_CREATE,
-        exclusive       = MPI_MODE_EXCL,
-        delete_on_close = MPI_MODE_DELETE_ON_CLOSE,
-        unique_open     = MPI_MODE_UNIQUE_OPEN,
-        sequential      = MPI_MODE_SEQUENTIAL,
-        append          = MPI_MODE_APPEND
-      };
-
-    public:
-      inline        file(const communicator& comm, const std::string& filename, int mode);
-                    ~file()                                 { close(); }
-      inline void   close();
-
-      inline offset size() const;
-      inline void   resize(offset size);
-
-      inline void   read_at(offset o, char* buffer, size_t size);
-      inline void   read_at_all(offset o, char* buffer, size_t size);
-      inline void   write_at(offset o, const char* buffer, size_t size);
-      inline void   write_at_all(offset o, const char* buffer, size_t size);
-
-      template<class T>
-      inline void   read_at(offset o, std::vector<T>& data);
-
-      template<class T>
-      inline void   read_at_all(offset o, std::vector<T>& data);
-
-      template<class T>
-      inline void   write_at(offset o, const std::vector<T>& data);
-
-      template<class T>
-      inline void   write_at_all(offset o, const std::vector<T>& data);
-
-      const communicator&
-                    comm() const                            { return comm_; }
-
-      MPI_File&     handle()                                { return fh; }
-
-    private:
-      const communicator&   comm_;
-      MPI_File              fh;
-  };
-}
-}
-}
-
-diy::mpi::io::file::
-file(const communicator& comm__, const std::string& filename, int mode)
-: comm_(comm__)
-{
-#ifndef VTKM_DIY_NO_MPI
-  int ret = MPI_File_open(comm__, const_cast<char*>(filename.c_str()), mode, MPI_INFO_NULL, &fh);
-  if (ret)
-      throw std::runtime_error("DIY cannot open file: " + filename);
-#else
-  DIY_UNUSED(comm__);
-  DIY_UNUSED(filename);
-  DIY_UNUSED(mode);
-  DIY_UNSUPPORTED_MPI_CALL(MPI_File_open);
-#endif
-}
-
-void
-diy::mpi::io::file::
-close()
-{
-#ifndef VTKM_DIY_NO_MPI
-  if (fh != MPI_FILE_NULL)
-    MPI_File_close(&fh);
-#endif
-}
-
-diy::mpi::io::offset
-diy::mpi::io::file::
-size() const
-{
-#ifndef VTKM_DIY_NO_MPI
-  offset sz;
-  MPI_File_get_size(fh, &sz);
-  return sz;
-#else
-  DIY_UNSUPPORTED_MPI_CALL(MPI_File_get_size);
-#endif
-}
-
-void
-diy::mpi::io::file::
-resize(diy::mpi::io::offset size_)
-{
-#ifndef VTKM_DIY_NO_MPI
-  MPI_File_set_size(fh, size_);
-#else
-  DIY_UNUSED(size_);
-  DIY_UNSUPPORTED_MPI_CALL(MPI_File_set_size);
-#endif
-}
-
-void
-diy::mpi::io::file::
-read_at(offset o, char* buffer, size_t size_)
-{
-#ifndef VTKM_DIY_NO_MPI
-  status s;
-  MPI_File_read_at(fh, o, buffer, static_cast<int>(size_), detail::get_mpi_datatype<char>(), &s.s);
-#else
-  DIY_UNUSED(o);
-  DIY_UNUSED(buffer);
-  DIY_UNUSED(size_);
-  DIY_UNSUPPORTED_MPI_CALL(MPI_File_read_at);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::io::file::
-read_at(offset o, std::vector<T>& data)
-{
-  read_at(o, &data[0], data.size()*sizeof(T));
-}
-
-void
-diy::mpi::io::file::
-read_at_all(offset o, char* buffer, size_t size_)
-{
-#ifndef VTKM_DIY_NO_MPI
-  status s;
-  MPI_File_read_at_all(fh, o, buffer, static_cast<int>(size_), detail::get_mpi_datatype<char>(), &s.s);
-#else
-  DIY_UNUSED(o);
-  DIY_UNUSED(buffer);
-  DIY_UNUSED(size_);
-  DIY_UNSUPPORTED_MPI_CALL(MPI_File_read_at_all);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::io::file::
-read_at_all(offset o, std::vector<T>& data)
-{
-  read_at_all(o, (char*) &data[0], data.size()*sizeof(T));
-}
-
-void
-diy::mpi::io::file::
-write_at(offset o, const char* buffer, size_t size_)
-{
-#ifndef VTKM_DIY_NO_MPI
-  status s;
-  MPI_File_write_at(fh, o, (void *)buffer, static_cast<int>(size_), detail::get_mpi_datatype<char>(), &s.s);
-#else
-  DIY_UNUSED(o);
-  DIY_UNUSED(buffer);
-  DIY_UNUSED(size_);
-  DIY_UNSUPPORTED_MPI_CALL(MPI_File_write_at);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::io::file::
-write_at(offset o, const std::vector<T>& data)
-{
-  write_at(o, (const char*) &data[0], data.size()*sizeof(T));
-}
-
-void
-diy::mpi::io::file::
-write_at_all(offset o, const char* buffer, size_t size_)
-{
-#ifndef VTKM_DIY_NO_MPI
-  status s;
-  MPI_File_write_at_all(fh, o, (void *)buffer, static_cast<int>(size_), detail::get_mpi_datatype<char>(), &s.s);
-#else
-  DIY_UNUSED(o);
-  DIY_UNUSED(buffer);
-  DIY_UNUSED(size_);
-  DIY_UNSUPPORTED_MPI_CALL(MPI_File_write_at_all);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::io::file::
-write_at_all(offset o, const std::vector<T>& data)
-{
-  write_at_all(o, &data[0], data.size()*sizeof(T));
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/no-mpi.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/no-mpi.hpp
@ -1,92 +0,0 @@
-#ifndef VTKMDIY_MPI_NO_MPI_HPP
-#define VTKMDIY_MPI_NO_MPI_HPP
-
-#include <stdexcept> // std::runtime_error
-
-
-static const int MPI_SUCCESS = 0;
-static const int MPI_ANY_SOURCE = -1;
-static const int MPI_ANY_TAG = -1;
-
-/* define communicator type and constants */
-using MPI_Comm = int;
-static const MPI_Comm MPI_COMM_NULL = 0;
-static const MPI_Comm MPI_COMM_WORLD = 1;
-
-/* MPI threading modes */
-static const int MPI_THREAD_SINGLE      = 0;
-static const int MPI_THREAD_FUNNELED    = 1;
-static const int MPI_THREAD_SERIALIZED  = 2;
-static const int MPI_THREAD_MULTIPLE    = 3;
-
-/* define datatypes */
-using MPI_Datatype = size_t;
-
-#define VTKM_DIY_NO_MPI_DATATYPE(cpp_type, mpi_type) \
-  static const MPI_Datatype mpi_type = sizeof(cpp_type);
-VTKM_DIY_NO_MPI_DATATYPE(char,                  MPI_BYTE);
-VTKM_DIY_NO_MPI_DATATYPE(int,                   MPI_INT);
-VTKM_DIY_NO_MPI_DATATYPE(unsigned,              MPI_UNSIGNED);
-VTKM_DIY_NO_MPI_DATATYPE(long,                  MPI_LONG);
-VTKM_DIY_NO_MPI_DATATYPE(unsigned long,         MPI_UNSIGNED_LONG);
-VTKM_DIY_NO_MPI_DATATYPE(long long,             MPI_LONG_LONG_INT);
-VTKM_DIY_NO_MPI_DATATYPE(unsigned long long,    MPI_UNSIGNED_LONG_LONG);
-VTKM_DIY_NO_MPI_DATATYPE(float,                 MPI_FLOAT);
-VTKM_DIY_NO_MPI_DATATYPE(double,                MPI_DOUBLE);
-#endif
-
-/* status type */
-struct MPI_Status
-{
-    /* These fields are publicly defined in the MPI specification.
-       User applications may freely read from these fields. */
-    int MPI_SOURCE;
-    int MPI_TAG;
-    int MPI_ERROR;
-};
-
-/* define MPI_Request */
-using MPI_Request = int;
-
-#ifndef DIY_UNSUPPORTED_MPI_CALL
-#define DIY_UNSUPPORTED_MPI_CALL(name) \
-  throw std::runtime_error("`" #name "` not supported when VTKM_DIY_NO_MPI is defined.");
-#endif
-
-/* define operations */
-using MPI_Op = int;
-static const MPI_Op MPI_MAX = 0;
-static const MPI_Op MPI_MIN = 0;
-static const MPI_Op MPI_SUM = 0;
-static const MPI_Op MPI_PROD = 0;
-static const MPI_Op MPI_LAND = 0;
-static const MPI_Op MPI_LOR = 0;
-
-/* mpi i/o stuff */
-using MPI_Offset = size_t;
-using MPI_File = int;
-static const MPI_File MPI_FILE_NULL = 0;
-
-static const int MPI_MODE_CREATE          =   1;
-static const int MPI_MODE_RDONLY          =   2;
-static const int MPI_MODE_WRONLY          =   4;
-static const int MPI_MODE_RDWR            =   8;
-static const int MPI_MODE_DELETE_ON_CLOSE =  16;
-static const int MPI_MODE_UNIQUE_OPEN     =  32;
-static const int MPI_MODE_EXCL            =  64;
-static const int MPI_MODE_APPEND          = 128;
-static const int MPI_MODE_SEQUENTIAL      = 256;
-
-/* define window type */
-using MPI_Win = int;
-
-/* window fence assertions */
-static const int MPI_MODE_NOSTORE       = 1;
-static const int MPI_MODE_NOPUT         = 2;
-static const int MPI_MODE_NOPRECEDE     = 4;
-static const int MPI_MODE_NOSUCCEED     = 8;
-static const int MPI_MODE_NOCHECK       = 16;
-
-/* window lock types */
-static const int MPI_LOCK_SHARED        = 1;
-static const int MPI_LOCK_EXCLUSIVE     = 2;
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/operations.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/operations.hpp
@ -1,27 +0,0 @@
-#include <algorithm> // for std::min/max
-#include <functional>
-
-namespace diy
-{
-namespace mpi
-{
-  //! \addtogroup MPI
-  //!@{
-  template<class U>
-  struct maximum { const U& operator()(const U& x, const U& y) const { return std::max(x,y); } };
-  template<class U>
-  struct minimum { const U& operator()(const U& x, const U& y) const { return std::min(x,y); } };
-  //!@}
-
-namespace detail
-{
-  template<class T> struct mpi_op                           { static MPI_Op  get(); };
-  template<class U> struct mpi_op< maximum<U> >             { static MPI_Op  get() { return MPI_MAX; }  };
-  template<class U> struct mpi_op< minimum<U> >             { static MPI_Op  get() { return MPI_MIN; }  };
-  template<class U> struct mpi_op< std::plus<U> >           { static MPI_Op  get() { return MPI_SUM; }  };
-  template<class U> struct mpi_op< std::multiplies<U> >     { static MPI_Op  get() { return MPI_PROD; }  };
-  template<class U> struct mpi_op< std::logical_and<U> >    { static MPI_Op  get() { return MPI_LAND; }  };
-  template<class U> struct mpi_op< std::logical_or<U> >     { static MPI_Op  get() { return MPI_LOR; }  };
-}
-}
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/optional.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/optional.hpp
@ -1,55 +0,0 @@
-namespace diy
-{
-namespace mpi
-{
-  template<class T>
-  struct optional
-  {
-                optional():
-                  init_(false)                  {}
-
-                optional(const T& v):
-                  init_(true)                   { new(buf_) T(v); }
-
-                optional(const optional& o):
-                  init_(o.init_)                { if (init_) new(buf_) T(*o);  }
-
-                ~optional()                     { if (init_) clear(); }
-
-    inline
-    optional&   operator=(const optional& o);
-
-                operator bool() const           { return init_; }
-
-    T&          operator*()                     { return *static_cast<T*>(address()); }
-    const T&    operator*() const               { return *static_cast<const T*>(address()); }
-
-    T*          operator->()                    { return &(operator*()); }
-    const T*    operator->() const              { return &(operator*()); }
-
-    private:
-      void      clear()                         { static_cast<T*>(address())->~T(); }
-
-      void*         address()                   { return buf_; }
-      const void*   address() const             { return buf_; }
-
-    private:
-      bool init_;
-      char buf_[sizeof(T)];
-  };
-}
-}
-
-template<class T>
-diy::mpi::optional<T>&
-diy::mpi::optional<T>::
-operator=(const optional& o)
-{
-  if (init_)
-    clear();
-  init_ = o.init_;
-  if (init_)
-    new (buf_) T(*o);
-
-  return *this;
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/point-to-point.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/point-to-point.hpp
@ -1,147 +0,0 @@
-#include <vector>
-
-namespace diy
-{
-namespace mpi
-{
-namespace detail
-{
-  // send
-  template< class T, class is_mpi_datatype_ = typename is_mpi_datatype<T>::type >
-  struct send;
-
-  template<class T>
-  struct send<T, true_type>
-  {
-    void operator()(MPI_Comm comm, int dest, int tag, const T& x) const
-    {
-#ifndef VTKM_DIY_NO_MPI
-      typedef       mpi_datatype<T>     Datatype;
-      MPI_Send((void*) Datatype::address(x),
-               Datatype::count(x),
-               Datatype::datatype(),
-               dest, tag, comm);
-#else
-      (void) comm; (void) dest; (void) tag; (void) x;
-      DIY_UNSUPPORTED_MPI_CALL(MPI_Send);
-#endif
-    }
-  };
-
-  // recv
-  template< class T, class is_mpi_datatype_ = typename is_mpi_datatype<T>::type >
-  struct recv;
-
-  template<class T>
-  struct recv<T, true_type>
-  {
-    status operator()(MPI_Comm comm, int source, int tag, T& x) const
-    {
-#ifndef VTKM_DIY_NO_MPI
-      typedef       mpi_datatype<T>     Datatype;
-      status s;
-      MPI_Recv((void*) Datatype::address(x),
-                Datatype::count(x),
-                Datatype::datatype(),
-                source, tag, comm, &s.s);
-      return s;
-#else
-      (void) comm; (void) source; (void) tag; (void) x;
-      DIY_UNSUPPORTED_MPI_CALL(MPI_Recv);
-#endif
-    }
-  };
-
-  template<class U>
-  struct recv<std::vector<U>, true_type>
-  {
-    status operator()(MPI_Comm comm, int source, int tag, std::vector<U>& x) const
-    {
-#ifndef VTKM_DIY_NO_MPI
-      status s;
-
-      MPI_Probe(source, tag, comm, &s.s);
-      x.resize(s.count<U>());
-      MPI_Recv(&x[0], static_cast<int>(x.size()), get_mpi_datatype<U>(), source, tag, comm, &s.s);
-      return s;
-#else
-      (void) comm; (void) source; (void) tag; (void) x;
-      DIY_UNSUPPORTED_MPI_CALL(MPI_Recv);
-#endif
-    }
-  };
-
-  // isend
-  template< class T, class is_mpi_datatype_ = typename is_mpi_datatype<T>::type >
-  struct isend;
-
-  template<class T>
-  struct isend<T, true_type>
-  {
-    request operator()(MPI_Comm comm, int dest, int tag, const T& x) const
-    {
-#ifndef VTKM_DIY_NO_MPI
-      request r;
-      typedef       mpi_datatype<T>     Datatype;
-      MPI_Isend((void*) Datatype::address(x),
-                Datatype::count(x),
-                Datatype::datatype(),
-                dest, tag, comm, &r.r);
-      return r;
-#else
-      (void) comm; (void) dest; (void) tag; (void) x;
-      DIY_UNSUPPORTED_MPI_CALL(MPI_Isend);
-#endif
-    }
-  };
-
-  // issend
-  template< class T, class is_mpi_datatype_ = typename is_mpi_datatype<T>::type >
-  struct issend;
-
-  template<class T>
-  struct issend<T, true_type>
-  {
-    request operator()(MPI_Comm comm, int dest, int tag, const T& x) const
-    {
-#ifndef VTKM_DIY_NO_MPI
-      request r;
-      typedef       mpi_datatype<T>     Datatype;
-      MPI_Issend((void*) Datatype::address(x),
-                Datatype::count(x),
-                Datatype::datatype(),
-                dest, tag, comm, &r.r);
-      return r;
-#else
-      (void) comm; (void) dest; (void) tag; (void) x;
-      DIY_UNSUPPORTED_MPI_CALL(MPI_Issend);
-#endif
-    }
-  };
-
-  // irecv
-  template< class T, class is_mpi_datatype_ = typename is_mpi_datatype<T>::type >
-  struct irecv;
-
-  template<class T>
-  struct irecv<T, true_type>
-  {
-    request operator()(MPI_Comm comm, int source, int tag, T& x) const
-    {
-#ifndef VTKM_DIY_NO_MPI
-      request r;
-      typedef       mpi_datatype<T>     Datatype;
-      MPI_Irecv(Datatype::address(x),
-                Datatype::count(x),
-                Datatype::datatype(),
-                source, tag, comm, &r.r);
-      return r;
-#else
-      (void) comm; (void) source; (void) tag; (void) x;
-      DIY_UNSUPPORTED_MPI_CALL(MPI_Irecv);
-#endif
-    }
-  };
-}
-}
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/request.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/request.hpp
@ -1,50 +0,0 @@
-namespace diy
-{
-namespace mpi
-{
-  struct request
-  {
-    inline
-    status              wait();
-    inline
-    optional<status>    test();
-    inline
-    void                cancel();
-
-    MPI_Request         r;
-  };
-}
-}
-
-diy::mpi::status
-diy::mpi::request::wait()
-{
-#ifndef VTKM_DIY_NO_MPI
-  status s;
-  MPI_Wait(&r, &s.s);
-  return s;
-#else
-  DIY_UNSUPPORTED_MPI_CALL(diy::mpi::request::wait);
-#endif
-}
-
-diy::mpi::optional<diy::mpi::status>
-diy::mpi::request::test()
-{
-#ifndef VTKM_DIY_NO_MPI
-  status s;
-  int flag;
-  MPI_Test(&r, &flag, &s.s);
-  if (flag)
-    return s;
-#endif
-  return optional<status>();
-}
-
-void
-diy::mpi::request::cancel()
-{
-#ifndef VTKM_DIY_NO_MPI
-  MPI_Cancel(&r);
-#endif
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/status.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/status.hpp
@ -1,49 +0,0 @@
-namespace diy
-{
-namespace mpi
-{
-  struct status
-  {
-    int             source() const          { return s.MPI_SOURCE; }
-    int             tag() const             { return s.MPI_TAG; }
-    int             error() const           { return s.MPI_ERROR; }
-
-    inline
-    bool            cancelled() const;
-
-    template<class T>
-    int             count() const;
-
-                    operator MPI_Status&()              { return s; }
-                    operator const MPI_Status&() const  { return s; }
-
-    MPI_Status      s;
-  };
-}
-}
-
-
-bool
-diy::mpi::status::cancelled() const
-{
-#ifndef VTKM_DIY_NO_MPI
-  int flag;
-  MPI_Test_cancelled(const_cast<MPI_Status*>(&s), &flag);
-  return flag;
-#else
-  DIY_UNSUPPORTED_MPI_CALL(diy::mpi::status::cancelled);
-#endif
-}
-
-template<class T>
-int
-diy::mpi::status::count() const
-{
-#ifndef VTKM_DIY_NO_MPI
-  int c;
-  MPI_Get_count(const_cast<MPI_Status*>(&s), detail::get_mpi_datatype<T>(), &c);
-  return c;
-#else
-  DIY_UNSUPPORTED_MPI_CALL(diy::mpi::status::count);
-#endif
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/window.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/mpi/window.hpp
@ -1,282 +0,0 @@
-#include <type_traits>
-
-namespace diy
-{
-namespace mpi
-{
-
-    //! \ingroup MPI
-    //! Simple wrapper around MPI window functions.
-    template<class T>
-    class window
-    {
-        static_assert(std::is_same<typename detail::is_mpi_datatype<T>::type, detail::true_type>::value, "Only MPI datatypes are allowed in windows");
-
-        public:
-            inline window(const communicator& comm, unsigned size);
-            inline ~window();
-
-            // moving is Ok
-            window(window&&)      = default;
-            window& operator=(window&&) = default;
-
-            // cannot copy because of the buffer_
-            window(const window&) = delete;
-            window& operator=(const window&) = delete;
-
-            inline void put(const T&              x, int rank, unsigned offset);
-            inline void put(const std::vector<T>& x, int rank, unsigned offset);
-
-            inline void get(T&              x, int rank, unsigned offset);
-            inline void get(std::vector<T>& x, int rank, unsigned offset);
-
-            inline void fence(int assert);
-
-            inline void lock(int lock_type, int rank, int assert = 0);
-            inline void unlock(int rank);
-
-            inline void lock_all(int assert = 0);
-            inline void unlock_all();
-
-            inline void fetch_and_op(const T* origin, T* result, int rank, unsigned offset, MPI_Op op);
-            inline void fetch(T& result, int rank, unsigned offset);
-            inline void replace(const T& value, int rank, unsigned offset);
-
-            inline void sync();
-
-            inline void flush(int rank);
-            inline void flush_all();
-            inline void flush_local(int rank);
-            inline void flush_local_all();
-
-        private:
-            std::vector<T>      buffer_;
-            int                 rank_;
-#ifndef VTKM_DIY_NO_MPI
-            MPI_Win             window_;
-#endif
-    };
-} // mpi
-} // diy
-
-template<class T>
-diy::mpi::window<T>::
-window(const communicator& comm, unsigned size):
-  buffer_(size), rank_(comm.rank())
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_create(buffer_.data(), buffer_.size()*sizeof(T), sizeof(T), MPI_INFO_NULL, comm, &window_);
-#endif
-}
-
-template<class T>
-diy::mpi::window<T>::
-~window()
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_free(&window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-put(const T& x, int rank, unsigned offset)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Put(address(x), count(x), datatype(x),
-            rank,
-            offset,
-            count(x), datatype(x),
-            window_);
-#else
-    buffer_[offset] = x;
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-put(const std::vector<T>& x, int rank, unsigned offset)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Put(address(x), count(x), datatype(x),
-            rank,
-            offset,
-            count(x), datatype(x),
-            window_);
-#else
-    for (size_t i = 0; i < x.size(); ++i)
-        buffer_[offset + i] = x[i];
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-get(T& x, int rank, unsigned offset)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Get(address(x), count(x), datatype(x),
-            rank,
-            offset,
-            count(x), datatype(x),
-            window_);
-#else
-    x = buffer_[offset];
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-get(std::vector<T>& x, int rank, unsigned offset)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Get(address(x), count(x), datatype(x),
-            rank,
-            offset,
-            count(x), datatype(x),
-            window_);
-#else
-    for (size_t i = 0; i < x.size(); ++i)
-        x[i] = buffer_[offset + i];
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-fence(int assert)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_fence(assert, window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-lock(int lock_type, int rank, int assert)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_lock(lock_type, rank, assert, window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-unlock(int rank)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_unlock(rank, window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-lock_all(int assert)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_lock_all(assert, window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-unlock_all()
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_unlock_all(window_);
-#endif
-}
-template<class T>
-void
-diy::mpi::window<T>::
-fetch_and_op(const T* origin, T* result, int rank, unsigned offset, MPI_Op op)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Fetch_and_op(origin, result, datatype(*origin), rank, offset, op, window_);
-#else
-    DIY_UNSUPPORTED_MPI_CALL(MPI_Fetch_and_op);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-fetch(T& result, int rank, unsigned offset)
-{
-#ifndef VTKM_DIY_NO_MPI
-    T unused;
-    fetch_and_op(&unused, &result, rank, offset, MPI_NO_OP);
-#else
-    result = buffer_[offset];
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-replace(const T& value, int rank, unsigned offset)
-{
-#ifndef VTKM_DIY_NO_MPI
-    T unused;
-    fetch_and_op(&value, &unused, rank, offset, MPI_REPLACE);
-#else
-    buffer_[offset] = value;
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-sync()
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_sync(window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-flush(int rank)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_flush(rank, window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-flush_all()
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_flush_all(window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-flush_local(int rank)
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_flush_local(rank, window_);
-#endif
-}
-
-template<class T>
-void
-diy::mpi::window<T>::
-flush_local_all()
-{
-#ifndef VTKM_DIY_NO_MPI
-    MPI_Win_flush_local_all(window_);
-#endif
-}
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/no-thread.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/no-thread.hpp
@ -1,42 +0,0 @@
-#ifndef VTKMDIY_NO_THREAD_HPP
-#define VTKMDIY_NO_THREAD_HPP
-
-#include <utility>
-#include <functional>
-#include <type_traits>
-
-// replicates only the parts of the threading interface that we use
-// executes everything in a single thread
-namespace diy
-{
-  struct thread
-  {
-                        thread()                                  {}
-                        thread(thread&&)                          = default;
-                        thread(const thread&)                     = delete;
-
-    template<class Function, class... Args>
-    explicit            thread(Function&& f, Args&&... args)      { f(args...); }       // not ideal, since it doesn't support member functions
-
-    void                join()                                    {}
-
-    static unsigned     hardware_concurrency()                    { return 1; }
-  };
-
-  struct mutex {};
-  struct fast_mutex {};
-  struct recursive_mutex {};
-
-  template<class T>
-  struct lock_guard
-  {
-      lock_guard(T&)        {}
-  };
-
-  namespace this_thread
-  {
-      inline unsigned long int  get_id()    { return 0; }
-  }
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/all-reduce.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/all-reduce.hpp
@ -1,72 +0,0 @@
-#ifndef VTKMDIY_PARTNERS_ALL_REDUCE_HPP
-#define VTKMDIY_PARTNERS_ALL_REDUCE_HPP
-
-#include "merge.hpp"
-
-namespace diy
-{
-
-class Master;
-
-//! Allreduce (reduction with results broadcasted to all blocks) is
-//! implemented as two merge reductions, with incoming and outgoing items swapped in second one.
-//! Ie, follows merge reduction up and down the merge tree
-
-/**
- * \ingroup Communication
- * \brief Partners for all-reduce
- *
- */
-struct RegularAllReducePartners: public RegularMergePartners
-{
-  typedef       RegularMergePartners                            Parent; //!< base class merge reduction
-
-                //! contiguous parameter indicates whether to match partners contiguously or in a round-robin fashion;
-                //! contiguous is useful when data needs to be united;
-                //! round-robin is useful for vector-"halving"
-  template<class Decomposer>
-                RegularAllReducePartners(const Decomposer& decomposer,  //!< domain decomposition
-                                         int k,                         //!< target k value
-                                         bool contiguous = true         //!< distance doubling (true) or halving (false)
-                    ):
-                  Parent(decomposer, k, contiguous)         {}
-                RegularAllReducePartners(const DivisionVector&   divs,//!< explicit division vector
-                                         const KVSVector&        kvs, //!< explicit k vector
-                                         bool  contiguous = true      //!< distance doubling (true) or halving (false)
-                    ):
-                  Parent(divs, kvs, contiguous)               {}
-
-  //! returns total number of rounds
-  size_t        rounds() const                                  { return 2*Parent::rounds(); }
-  //! returns size of a group of partners in a given round
-  int           size(int round) const                           { return Parent::size(parent_round(round)); }
-  //! returns dimension (direction of partners in a regular grid) in a given round
-  int           dim(int round) const                            { return Parent::dim(parent_round(round)); }
-  //! returns whether a given block in a given round has dropped out of the merge yet or not
-  inline bool   active(int round, int gid, const Master& m) const { return Parent::active(parent_round(round), gid, m); }
-  //! returns what the current round would be in the first or second parent merge reduction
-  int           parent_round(int round) const                   { return round < (int) Parent::rounds() ? round : static_cast<int>(rounds()) - round; }
-
-  // incoming is only valid for an active gid; it will only be called with an active gid
-  inline void   incoming(int round, int gid, std::vector<int>& partners, const Master& m) const
-  {
-      if (round <= (int) Parent::rounds())
-          Parent::incoming(round, gid, partners, m);
-      else
-          Parent::outgoing(parent_round(round), gid, partners, m);
-  }
-
-  inline void   outgoing(int round, int gid, std::vector<int>& partners, const Master& m) const
-  {
-      if (round < (int) Parent::rounds())
-          Parent::outgoing(round, gid, partners, m);
-      else
-          Parent::incoming(parent_round(round), gid, partners, m);
-  }
-};
-
-} // diy
-
-#endif
-
-
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/broadcast.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/broadcast.hpp
@ -1,62 +0,0 @@
-#ifndef VTKMDIY_PARTNERS_BROADCAST_HPP
-#define VTKMDIY_PARTNERS_BROADCAST_HPP
-
-#include "merge.hpp"
-
-namespace diy
-{
-
-class Master;
-
-/**
- * \ingroup Communication
- * \brief Partners for broadcast
- *
- */
-struct RegularBroadcastPartners: public RegularMergePartners
-{
-  typedef       RegularMergePartners                            Parent; //!< base class merge reduction
-
-                //! contiguous parameter indicates whether to match partners contiguously or in a round-robin fashion;
-                //! contiguous is useful when data needs to be united;
-                //! round-robin is useful for vector-"halving"
-  template<class Decomposer>
-                RegularBroadcastPartners(const Decomposer& decomposer,  //!< domain decomposition
-                                         int k,                         //!< target k value
-                                         bool contiguous = true         //!< distance doubling (true) or halving (false)
-                    ):
-                  Parent(decomposer, k, contiguous)         {}
-                RegularBroadcastPartners(const DivisionVector&   divs,//!< explicit division vector
-                                         const KVSVector&        kvs, //!< explicit k vector
-                                         bool  contiguous = true      //!< distance doubling (true) or halving (false)
-                    ):
-                  Parent(divs, kvs, contiguous)               {}
-
-  //! returns total number of rounds
-  size_t        rounds() const                                  { return Parent::rounds(); }
-  //! returns size of a group of partners in a given round
-  int           size(int round) const                           { return Parent::size(parent_round(round)); }
-  //! returns dimension (direction of partners in a regular grid) in a given round
-  int           dim(int round) const                            { return Parent::dim(parent_round(round)); }
-  //! returns whether a given block in a given round has dropped out of the merge yet or not
-  inline bool   active(int round, int gid, const Master& m) const { return Parent::active(parent_round(round), gid, m); }
-  //! returns what the current round would be in the first or second parent merge reduction
-  int           parent_round(int round) const                   { return rounds() - round; }
-
-  // incoming is only valid for an active gid; it will only be called with an active gid
-  inline void   incoming(int round, int gid, std::vector<int>& partners, const Master& m) const
-  {
-      Parent::outgoing(parent_round(round), gid, partners, m);
-  }
-
-  inline void   outgoing(int round, int gid, std::vector<int>& partners, const Master& m) const
-  {
-      Parent::incoming(parent_round(round), gid, partners, m);
-  }
-};
-
-} // diy
-
-#endif
-
-
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/common.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/common.hpp
@ -1,204 +0,0 @@
-#ifndef VTKMDIY_PARTNERS_COMMON_HPP
-#define VTKMDIY_PARTNERS_COMMON_HPP
-
-#include "../decomposition.hpp"
-#include "../types.hpp"
-
-namespace diy
-{
-
-struct RegularPartners
-{
-  // The record of group size per round in a dimension
-  struct DimK
-  {
-            DimK(int dim_, int k_):
-                dim(dim_), size(k_)               {}
-
-    int dim;
-    int size;           // group size
-  };
-
-  typedef       std::vector<int>                    CoordVector;
-  typedef       std::vector<int>                    DivisionVector;
-  typedef       std::vector<DimK>                   KVSVector;
-
-  // The part of RegularDecomposer that we need works the same with either Bounds (so we fix them arbitrarily)
-  typedef       DiscreteBounds                      Bounds;
-  typedef       RegularDecomposer<Bounds>           Decomposer;
-
-  template<class Decomposer_>
-                RegularPartners(const Decomposer_& decomposer, int k, bool contiguous = true):
-                  divisions_(decomposer.divisions),
-                  contiguous_(contiguous)                       { factor(k, divisions_, kvs_); fill_steps(); }
-                RegularPartners(const DivisionVector&   divs,
-                                const KVSVector&        kvs,
-                                bool  contiguous = true):
-                  divisions_(divs), kvs_(kvs),
-                  contiguous_(contiguous)                       { fill_steps(); }
-
-  size_t        rounds() const                                  { return kvs_.size(); }
-  int           size(int round) const                           { return kvs_[round].size; }
-  int           dim(int round) const                            { return kvs_[round].dim; }
-
-  int           step(int round) const                           { return steps_[round]; }
-
-  const DivisionVector&     divisions() const                   { return divisions_; }
-  const KVSVector&          kvs() const                         { return kvs_; }
-  bool                      contiguous() const                  { return contiguous_; }
-
-  static
-  inline void   factor(int k, const DivisionVector& divisions, KVSVector& kvs);
-
-  inline void   fill(int round, int gid, std::vector<int>& partners) const;
-  inline int    group_position(int round, int c, int step) const;
-
-  private:
-    inline void fill_steps();
-    static
-    inline void factor(int k, int tot_b, std::vector<int>& kvs);
-
-    DivisionVector      divisions_;
-    KVSVector           kvs_;
-    bool                contiguous_;
-    std::vector<int>    steps_;
-};
-
-}
-
-void
-diy::RegularPartners::
-fill_steps()
-{
-  if (contiguous_)
-  {
-    std::vector<int>    cur_steps(divisions().size(), 1);
-
-    for (size_t r = 0; r < rounds(); ++r)
-    {
-      steps_.push_back(cur_steps[kvs_[r].dim]);
-      cur_steps[kvs_[r].dim] *= kvs_[r].size;
-    }
-  } else
-  {
-    std::vector<int>    cur_steps(divisions().begin(), divisions().end());
-    for (size_t r = 0; r < rounds(); ++r)
-    {
-      cur_steps[kvs_[r].dim] /= kvs_[r].size;
-      steps_.push_back(cur_steps[kvs_[r].dim]);
-    }
-  }
-}
-
-void
-diy::RegularPartners::
-fill(int round, int gid, std::vector<int>& partners) const
-{
-  const DimK&   kv  = kvs_[round];
-  partners.reserve(kv.size);
-
-  int step = this->step(round);       // gids jump by this much in the current round
-
-  CoordVector   coords;
-  Decomposer::gid_to_coords(gid, coords, divisions_);
-  int c   = coords[kv.dim];
-  int pos = group_position(round, c, step);
-
-  int partner = c - pos * step;
-  coords[kv.dim] = partner;
-  int partner_gid = Decomposer::coords_to_gid(coords, divisions_);
-  partners.push_back(partner_gid);
-
-  for (int k = 1; k < kv.size; ++k)
-  {
-    partner += step;
-    coords[kv.dim] = partner;
-    partner_gid = Decomposer::coords_to_gid(coords, divisions_);
-    partners.push_back(partner_gid);
-  }
-}
-
-// Tom's GetGrpPos
-int
-diy::RegularPartners::
-group_position(int round, int c, int step) const
-{
-  // the second term in the following expression does not simplify to
-  // (gid - start_b) / kv[r]
-  // because the division gid / (step * kv[r]) is integer and truncates
-  // this is exactly what we want
-  int g = c % step + c / (step * kvs_[round].size) * step;
-  int p = c / step % kvs_[round].size;
-  static_cast<void>(g);        // shut up the compiler
-
-  // g: group number (output)
-  // p: position number within the group (output)
-  return p;
-}
-
-void
-diy::RegularPartners::
-factor(int k, const DivisionVector& divisions, KVSVector& kvs)
-{
-  // factor in each dimension
-  std::vector< std::vector<int> >       tmp_kvs(divisions.size());
-  for (unsigned i = 0; i < divisions.size(); ++i)
-    factor(k, divisions[i], tmp_kvs[i]);
-
-  // interleave the dimensions
-  std::vector<int>  round_per_dim(divisions.size(), 0);
-  while(true)
-  {
-    // TODO: not the most efficient way to do this
-    bool changed = false;
-    for (unsigned i = 0; i < divisions.size(); ++i)
-    {
-      if (round_per_dim[i] == (int) tmp_kvs[i].size())
-        continue;
-      kvs.push_back(DimK(i, tmp_kvs[i][round_per_dim[i]++]));
-      changed = true;
-    }
-    if (!changed)
-        break;
-  }
-}
-
-// Tom's FactorK
-void
-diy::RegularPartners::
-factor(int k, int tot_b, std::vector<int>& kv)
-{
-  int rem = tot_b; // unfactored remaining portion of tot_b
-  int j;
-
-  while (rem > 1)
-  {
-    // remainder is divisible by k
-    if (rem % k == 0)
-    {
-      kv.push_back(k);
-      rem /= k;
-    }
-    // if not, start at k and linearly look for smaller factors down to 2
-    else
-    {
-      for (j = k - 1; j > 1; j--)
-      {
-        if (rem % j == 0)
-        {
-          kv.push_back(j);
-          rem /= j;
-          break;
-        }
-      }
-      if (j == 1)
-      {
-        kv.push_back(rem);
-        rem = 1;
-      }
-    } // else
-  } // while
-}
-
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/merge.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/merge.hpp
@ -1,60 +0,0 @@
-#ifndef VTKMDIY_PARTNERS_MERGE_HPP
-#define VTKMDIY_PARTNERS_MERGE_HPP
-
-#include "common.hpp"
-
-namespace diy
-{
-
-class Master;
-
-/**
- * \ingroup Communication
- * \brief Partners for merge-reduce
- *
- */
-struct RegularMergePartners: public RegularPartners
-{
-  typedef       RegularPartners                                 Parent;
-
-                // contiguous parameter indicates whether to match partners contiguously or in a round-robin fashion;
-                // contiguous is useful when data needs to be united;
-                // round-robin is useful for vector-"halving"
-  template<class Decomposer>
-                RegularMergePartners(const Decomposer& decomposer,  //!< domain decomposition
-                                     int k,                         //!< target k value
-                                     bool contiguous = true         //!< distance doubling (true) or halving (false)
-                    ):
-                    Parent(decomposer, k, contiguous)           {}
-                RegularMergePartners(const DivisionVector&   divs, //!< explicit division vector
-                                     const KVSVector&        kvs,  //!< explicit k vector
-                                     bool  contiguous = true       //!< distance doubling (true) or halving (false)
-                    ):
-                    Parent(divs, kvs, contiguous)               {}
-
-  inline bool   active(int round, int gid, const Master&) const;
-
-  // incoming is only valid for an active gid; it will only be called with an active gid
-  inline void   incoming(int round, int gid, std::vector<int>& partners, const Master&) const    { Parent::fill(round - 1, gid, partners); }
-  // this is a lazy implementation of outgoing, but it reuses the existing code
-  inline void   outgoing(int round, int gid, std::vector<int>& partners, const Master&) const    { std::vector<int> tmp; Parent::fill(round, gid, tmp); partners.push_back(tmp[0]); }
-};
-
-} // diy
-
-bool
-diy::RegularMergePartners::
-active(int round, int gid, const Master&) const
-{
-  CoordVector   coords;
-  Decomposer::gid_to_coords(gid, coords, divisions());
-
-  for (int r = 0; r < round; ++r)
-      if (Parent::group_position(r, coords[kvs()[r].dim], step(r)) != 0)
-          return false;
-
-  return true;
-}
-
-#endif
-
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/swap.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/partners/swap.hpp
@ -1,43 +0,0 @@
-#ifndef VTKMDIY_PARTNERS_SWAP_HPP
-#define VTKMDIY_PARTNERS_SWAP_HPP
-
-#include "common.hpp"
-
-namespace diy
-{
-
-class Master;
-
-/**
- * \ingroup Communication
- * \brief Partners for swap-reduce
- *
- */
-struct RegularSwapPartners: public RegularPartners
-{
-  typedef       RegularPartners                                 Parent;
-
-                // contiguous parameter indicates whether to match partners contiguously or in a round-robin fashion;
-                // contiguous is useful when data needs to be united;
-                // round-robin is useful for vector-"halving"
-  template<class Decomposer>
-                RegularSwapPartners(const Decomposer& decomposer,   //!< domain decomposition
-                                    int k,                          //!< target k value
-                                    bool contiguous = true          //!< distance halving (true) or doubling (false)
-                    ):
-                    Parent(decomposer, k, contiguous)         {}
-                RegularSwapPartners(const DivisionVector&   divs, //!< explicit division vector
-                                    const KVSVector&        kvs,  //!< explicit k vector
-                                    bool  contiguous = true       //!< distance halving (true) or doubling (false)
-                    ):
-                    Parent(divs, kvs, contiguous)               {}
-
-  bool          active(int, int, const Master&) const                                           { return true; }    // in swap-reduce every block is always active
-
-  void          incoming(int round, int gid, std::vector<int>& partners, const Master&) const   { Parent::fill(round - 1, gid, partners); }
-  void          outgoing(int round, int gid, std::vector<int>& partners, const Master&) const   { Parent::fill(round, gid, partners); }
-};
-
-} // diy
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/pick.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/pick.hpp
@ -1,137 +0,0 @@
-#ifndef VTKMDIY_PICK_HPP
-#define VTKMDIY_PICK_HPP
-
-#include "link.hpp"
-
-namespace diy
-{
-    template<class Bounds, class Point, class OutIter>
-    void near(const RegularLink<Bounds>& link, const Point& p, float r, OutIter out,
-              const Bounds& domain);
-
-    template<class Bounds, class Point, class OutIter>
-    void in(const RegularLink<Bounds>& link, const Point& p, OutIter out, const Bounds& domain);
-
-    template<class Point, class Bounds>
-    float distance(int dim, const Bounds& bounds, const Point& p);
-
-    template<class Bounds>
-    inline
-    float distance(int dim, const Bounds& bounds1, const Bounds& bounds2);
-
-    template<class Bounds>
-    void wrap_bounds(Bounds& bounds, Direction wrap_dir, const Bounds& domain, int dim);
-}
-
-//! Finds the neighbors within radius r of a target point.
-template<class Bounds, class Point, class OutIter>
-void
-diy::
-near(const RegularLink<Bounds>& link,  //!< neighbors
-     const Point& p,                   //!< target point (must be in current block)
-     float r,                          //!< target radius (>= 0.0)
-     OutIter out,                      //!< insert iterator for output set of neighbors
-     const Bounds& domain)             //!< global domain bounds
-{
-  Bounds neigh_bounds; // neighbor block bounds
-
-  // for all neighbors of this block
-  for (int n = 0; n < link.size(); n++)
-  {
-    // wrap neighbor bounds, if necessary, otherwise bounds will be unchanged
-    neigh_bounds = link.bounds(n);
-    wrap_bounds(neigh_bounds, link.wrap(n), domain, link.dimension());
-
-    if (distance(link.dimension(), neigh_bounds, p) <= r)
-        *out++ = n;
-  } // for all neighbors
-}
-
-//! Find the distance between point `p` and box `bounds`.
-template<class Point, class Bounds>
-float
-diy::
-distance(int dim, const Bounds& bounds, const Point& p)
-{
-    float res = 0;
-    for (int i = 0; i < dim; ++i)
-    {
-        // avoids all the annoying case logic by finding
-        // diff = max(bounds.min[i] - p[i], 0, p[i] - bounds.max[i])
-        float diff = 0, d;
-
-        d = bounds.min[i] - p[i];
-        if (d > diff) diff = d;
-        d = p[i] - bounds.max[i];
-        if (d > diff) diff = d;
-
-        res += diff*diff;
-    }
-    return sqrt(res);
-}
-
-template<class Bounds>
-float
-diy::
-distance(int dim, const Bounds& bounds1, const Bounds& bounds2)
-{
-    float res = 0;
-    for (int i = 0; i < dim; ++i)
-    {
-        float diff = 0, d;
-
-        float d1 = bounds1.max[i] - bounds2.min[i];
-        float d2 = bounds2.max[i] - bounds1.min[i];
-
-        if (d1 > 0 && d2 > 0)
-            diff = 0;
-        else if (d1 <= 0)
-            diff = -d1;
-        else if (d2 <= 0)
-            diff = -d2;
-
-        res += diff*diff;
-    }
-    return sqrt(res);
-}
-
-//! Finds the neighbor(s) containing the target point.
-template<class Bounds, class Point, class OutIter>
-void
-diy::
-in(const RegularLink<Bounds>& link,  //!< neighbors
-   const Point& p,                   //!< target point
-   OutIter out,                      //!< insert iterator for output set of neighbors
-   const Bounds& domain)             //!< global domain bounds
-{
-  Bounds neigh_bounds; // neighbor block bounds
-
-  // for all neighbors of this block
-  for (int n = 0; n < link.size(); n++)
-  {
-    // wrap neighbor bounds, if necessary, otherwise bounds will be unchanged
-    neigh_bounds = link.bounds(n);
-    wrap_bounds(neigh_bounds, link.wrap(n), domain, link.dimension());
-
-    if (distance(link.dimension(), neigh_bounds, p) == 0)
-        *out++ = n;
-  } // for all neighbors
-}
-
-// wraps block bounds
-// wrap dir is the wrapping direction from original block to wrapped neighbor block
-// overall domain bounds and dimensionality are also needed
-template<class Bounds>
-void
-diy::
-wrap_bounds(Bounds& bounds, Direction wrap_dir, const Bounds& domain, int dim)
-{
-  for (int i = 0; i < dim; ++i)
-  {
-    bounds.min[i] += wrap_dir[i] * (domain.max[i] - domain.min[i]);
-    bounds.max[i] += wrap_dir[i] * (domain.max[i] - domain.min[i]);
-  }
-}
-
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/point.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/point.hpp
@ -1,120 +0,0 @@
-#ifndef VTKMDIY_POINT_HPP
-#define VTKMDIY_POINT_HPP
-
-#include <iostream>
-#include <vector>
-#include <string>
-#include <sstream>
-
-#include <array>
-
-namespace diy
-{
-
-template<class Coordinate_, unsigned D>
-class Point: public std::array<Coordinate_, D>
-{
-    public:
-        typedef             Coordinate_                             Coordinate;
-        typedef             std::array<Coordinate, D>               ArrayParent;
-
-        typedef             Point<Coordinate, D-1>                  LPoint;
-        typedef             Point<Coordinate, D+1>                  UPoint;
-
-        template<class U>
-        struct rebind       { typedef Point<U,D> type; };
-
-    public:
-                            Point()                                 { for (unsigned i = 0; i < D; ++i) (*this)[i] = 0; }
-                            Point(const ArrayParent& a):
-                                ArrayParent(a)                      {}
-        template<class T>   Point(const Point<T, D>& p)             { for (size_t i = 0; i < D; ++i) (*this)[i] = p[i]; }
-        template<class T>   Point(const T* a)                       { for (unsigned i = 0; i < D; ++i) (*this)[i] = a[i]; }
-        template<class T>   Point(const std::vector<T>& a)          { for (unsigned i = 0; i < D; ++i) (*this)[i] = a[i]; }
-                            Point(std::initializer_list<Coordinate> lst)   { unsigned i = 0; for (Coordinate x : lst) (*this)[i++] = x; }
-
-                            Point(Point&&)                          =default;
-                            Point(const Point&)                     =default;
-        Point&              operator=(const Point&)                 =default;
-
-        static constexpr
-        unsigned            dimension()                             { return D; }
-
-        static Point        zero()                                  { return Point(); }
-        static Point        one()                                   { Point p; for (unsigned i = 0; i < D; ++i) p[i] = 1; return p; }
-
-        LPoint              drop(int dim) const                     { LPoint p; unsigned c = 0; for (unsigned i = 0; i < D;   ++i) { if (i == dim) continue; p[c++] = (*this)[i]; } return p; }
-        UPoint              lift(int dim, Coordinate x) const       { UPoint p; for (unsigned i = 0; i < D+1; ++i) { if (i < dim) p[i] = (*this)[i]; else if (i == dim) p[i] = x; else if (i > dim) p[i] = (*this)[i-1]; } return p; }
-
-        using ArrayParent::operator[];
-
-        Point&              operator+=(const Point& y)              { for (unsigned i = 0; i < D; ++i) (*this)[i] += y[i];  return *this; }
-        Point&              operator-=(const Point& y)              { for (unsigned i = 0; i < D; ++i) (*this)[i] -= y[i];  return *this; }
-        Point&              operator*=(Coordinate a)                { for (unsigned i = 0; i < D; ++i) (*this)[i] *= a;     return *this; }
-        Point&              operator/=(Coordinate a)                { for (unsigned i = 0; i < D; ++i) (*this)[i] /= a;     return *this; }
-
-        Coordinate          norm() const                            { return (*this)*(*this); }
-
-        std::ostream&       operator<<(std::ostream& out) const     { out << (*this)[0]; for (unsigned i = 1; i < D; ++i) out << " " << (*this)[i]; return out; }
-        std::istream&       operator>>(std::istream& in);
-
-        friend
-        Point               operator+(Point x, const Point& y)       { x += y; return x; }
-
-        friend
-        Point               operator-(Point x, const Point& y)       { x -= y; return x; }
-
-        friend
-        Point               operator/(Point x, Coordinate y)         { x /= y; return x; }
-
-        friend
-        Point               operator*(Point x, Coordinate y)         { x *= y; return x; }
-
-        friend
-        Point               operator*(Coordinate y, Point x)         { x *= y; return x; }
-
-        friend
-        Coordinate          operator*(const Point& x, const Point& y)   { Coordinate n = 0; for (size_t i = 0; i < D; ++i) n += x[i] * y[i]; return n; }
-
-        template<class T>
-        friend
-        Coordinate          operator*(const Point<T,D>& x, const Point& y)   { Coordinate n = 0; for (size_t i = 0; i < D; ++i) n += x[i] * y[i]; return n; }
-};
-
-template<class C, unsigned D>
-std::istream&
-Point<C,D>::
-operator>>(std::istream& in)
-{
-    std::string point_str;
-    in >> point_str;        // read until ' '
-    std::stringstream ps(point_str);
-
-    char x;
-    for (unsigned i = 0; i < dimension(); ++i)
-    {
-        ps >> (*this)[i];
-        ps >> x;
-    }
-
-    return in;
-}
-
-
-template<class Coordinate, unsigned D>
-Coordinate norm2(const Point<Coordinate,D>& p)
-{ Coordinate res = 0; for (unsigned i = 0; i < D; ++i) res += p[i]*p[i]; return res; }
-
-template<class C, unsigned D>
-std::ostream&
-operator<<(std::ostream& out, const Point<C,D>& p)
-{ return p.operator<<(out); }
-
-template<class C, unsigned D>
-std::istream&
-operator>>(std::istream& in, Point<C,D>& p)
-{ return p.operator>>(in); }
-
-}
-
-#endif // DIY_POINT_HPP
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/proxy.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/proxy.hpp
@ -1,293 +0,0 @@
-#ifndef VTKMDIY_PROXY_HPP
-#define VTKMDIY_PROXY_HPP
-
-
-namespace diy
-{
-  //! Communication proxy, used for enqueueing and dequeueing items for future exchange.
-  struct Master::Proxy
-  {
-    template <class T>
-    struct EnqueueIterator;
-
-                        Proxy(Master* master__, int gid__):
-                          gid_(gid__),
-                          master_(master__),
-                          incoming_(&master__->incoming(gid__)),
-                          outgoing_(&master__->outgoing(gid__)),
-                          collectives_(&master__->collectives(gid__))   {}
-
-    int                 gid() const                                     { return gid_; }
-
-    //! Enqueue data whose size can be determined automatically, e.g., an STL vector.
-    template<class T>
-    void                enqueue(const BlockID&  to,                                     //!< target block (gid,proc)
-                                const T&        x,                                      //!< data (eg. STL vector)
-                                void (*save)(BinaryBuffer&, const T&) = &::diy::save<T> //!< optional serialization function
-                               ) const
-    { OutgoingQueues& out = *outgoing_; save(out[to], x); }
-
-    //! Enqueue data whose size is given explicitly by the user, e.g., an array.
-    template<class T>
-    void                enqueue(const BlockID&  to,                                     //!< target block (gid,proc)
-                                const T*        x,                                      //!< pointer to the data (eg. address of start of vector)
-                                size_t          n,                                      //!< size in data elements (eg. ints)
-                                void (*save)(BinaryBuffer&, const T&) = &::diy::save<T> //!< optional serialization function
-                               ) const;
-
-    //! Dequeue data whose size can be determined automatically (e.g., STL vector) and that was
-    //! previously enqueued so that diy knows its size when it is received.
-    //! In this case, diy will allocate the receive buffer; the user does not need to do so.
-    template<class T>
-    void                dequeue(int             from,                                   //!< target block gid
-                                T&              x,                                      //!< data (eg. STL vector)
-                                void (*load)(BinaryBuffer&, T&) = &::diy::load<T>       //!< optional serialization function
-                               ) const
-    { IncomingQueues& in  = *incoming_; load(in[from], x); }
-
-    //! Dequeue an array of data whose size is given explicitly by the user.
-    //! In this case, the user needs to allocate the receive buffer prior to calling dequeue.
-    template<class T>
-    void                dequeue(int             from,                                   //!< target block gid
-                                T*              x,                                      //!< pointer to the data (eg. address of start of vector)
-                                size_t          n,                                      //!< size in data elements (eg. ints)
-                                void (*load)(BinaryBuffer&, T&) = &::diy::load<T>       //!< optional serialization function
-                               ) const;
-
-    //! Dequeue data whose size can be determined automatically (e.g., STL vector) and that was
-    //! previously enqueued so that diy knows its size when it is received.
-    //! In this case, diy will allocate the receive buffer; the user does not need to do so.
-    template<class T>
-    void                dequeue(const BlockID&  from,                                   //!< target block (gid,proc)
-                                T&              x,                                      //!< data (eg. STL vector)
-                                void (*load)(BinaryBuffer&, T&) = &::diy::load<T>       //!< optional serialization function
-                               ) const                                  { dequeue(from.gid, x, load); }
-
-    //! Dequeue an array of data whose size is given explicitly by the user.
-    //! In this case, the user needs to allocate the receive buffer prior to calling dequeue.
-    template<class T>
-    void                dequeue(const BlockID&  from,                                   //!< target block (gid,proc)
-                                T*              x,                                      //!< pointer to the data (eg. address of start of vector)
-                                size_t          n,                                      //!< size in data elements (eg. ints)
-                                void (*load)(BinaryBuffer&, T&) = &::diy::load<T>       //!< optional serialization function
-                               ) const                                  { dequeue(from.gid, x, n, load); }
-
-    template<class T>
-    EnqueueIterator<T>  enqueuer(const T& x,
-                                 void (*save)(BinaryBuffer&, const T&) = &::diy::save<T>) const
-    { return EnqueueIterator<T>(this, x, save); }
-
-    IncomingQueues*     incoming() const                                { return incoming_; }
-    MemoryBuffer&       incoming(int from) const                        { return (*incoming_)[from]; }
-    inline void         incoming(std::vector<int>& v) const;            // fill v with every gid from which we have a message
-
-    OutgoingQueues*     outgoing() const                                { return outgoing_; }
-    MemoryBuffer&       outgoing(const BlockID& to) const               { return (*outgoing_)[to]; }
-
-/**
- * \ingroup Communication
- * \brief Post an all-reduce collective using an existing communication proxy.
- * Available operators are:
- * maximum<T>, minimum<T>, std::plus<T>, std::multiplies<T>, std::logical_and<T>, and
- * std::logical_or<T>.
- */
-    template<class T, class Op>
-    inline void         all_reduce(const T& in,                  //!< local value being reduced
-                                   Op op                         //!< operator
-                                   ) const;
-/**
- * \ingroup Communication
- * \brief Return the result of a proxy collective without popping it off the collectives list (same result would be returned multiple times). The list can be cleared with collectives()->clear().
- */
-    template<class T>
-    inline T            read() const;
-/**
- * \ingroup Communication
- * \brief Return the result of a proxy collective; result is popped off the collectives list.
- */
-    template<class T>
-    inline T            get() const;
-
-    template<class T>
-    inline void         scratch(const T& in) const;
-
-/**
- * \ingroup Communication
- * \brief Return the list of proxy collectives (values and operations)
- */
-    CollectivesList*    collectives() const                             { return collectives_; }
-
-    Master*             master() const                                  { return master_; }
-
-    private:
-      int               gid_;
-      Master*           master_;
-      IncomingQueues*   incoming_;
-      OutgoingQueues*   outgoing_;
-      CollectivesList*  collectives_;
-  };
-
-  template<class T>
-  struct Master::Proxy::EnqueueIterator:
-    public std::iterator<std::output_iterator_tag, void, void, void, void>
-  {
-    typedef     void (*SaveT)(BinaryBuffer&, const T&);
-
-                        EnqueueIterator(const Proxy* proxy, const T& x,
-                                        SaveT save = &::diy::save<T>):
-                            proxy_(proxy), x_(x), save_(save)               {}
-
-    EnqueueIterator&    operator=(const BlockID& to)                        { proxy_->enqueue(to, x_, save_); return *this; }
-    EnqueueIterator&    operator*()                                         { return *this; }
-    EnqueueIterator&    operator++()                                        { return *this; }
-    EnqueueIterator&    operator++(int)                                     { return *this; }
-
-    private:
-      const Proxy*  proxy_;
-      const T&      x_;
-      SaveT         save_;
-
-  };
-
-  struct Master::ProxyWithLink: public Master::Proxy
-  {
-            ProxyWithLink(const Proxy&    proxy,
-                          void*           block__,
-                          Link*           link__,
-                          IExchangeInfo*  iexchange__ = 0):
-              Proxy(proxy),
-              block_(block__),
-              link_(link__),
-              iexchange_(iexchange__)                               {}
-
-      Link*   link() const                                          { return link_; }
-      void*   block() const                                         { return block_; }
-
-    private:
-      void*             block_;
-      Link*             link_;
-      IExchangeInfo*    iexchange_;         // not used for iexchange presently, but later could trigger some special behavior
-
-    public:
-      template<class T>
-      void enqueue(const BlockID&     to,
-              const T&                x,
-              void (*save)(BinaryBuffer&, const T&) = &::diy::save<T>) const
-      {
-          diy::Master::Proxy::enqueue(to, x, save);
-          if (iexchange_)
-              master()->icommunicate(iexchange_);
-      }
-
-      template<class T>
-      void enqueue(const BlockID&     to,
-              const T*                x,
-              size_t                  n,
-              void (*save)(BinaryBuffer&, const T&) = &::diy::save<T>) const
-      {
-          diy::Master::Proxy::enqueue(to, x, n, save);
-          if (iexchange_)
-              master()->icommunicate(iexchange_);
-      }
-
-      template<class T>
-      void dequeue(int                from,
-              T&                      x,
-              void (*load)(BinaryBuffer&, T&) = &::diy::load<T>) const
-      {
-          // TODO: uncomment if necessary, try first without icommunicating on dequeue
-//           if (iexchange_)
-//               master()->icommunicate(iexchange_);
-          diy::Master::Proxy::dequeue(from, x, load);
-      }
-
-      template<class T>
-      void dequeue(int                from,
-              T*                      x,
-              size_t                  n,
-              void (*load)(BinaryBuffer&, T&) = &::diy::load<T>) const
-      {
-          // TODO: uncomment if necessary, try first without icommunicating on dequeue
-//           if (iexchange_)
-//               master()->icommunicate(iexchange_);
-          diy::Master::Proxy::dequeue(from, x, n, load);
-      }
-  };
-}                                           // diy namespace
-
-void
-diy::Master::Proxy::
-incoming(std::vector<int>& v) const
-{
-  for (IncomingQueues::const_iterator it = incoming_->begin(); it != incoming_->end(); ++it)
-    v.push_back(it->first);
-}
-
-template<class T, class Op>
-void
-diy::Master::Proxy::
-all_reduce(const T& in, Op op) const
-{
-  collectives_->push_back(Collective(new detail::AllReduceOp<T,Op>(in, op)));
-}
-
-template<class T>
-T
-diy::Master::Proxy::
-read() const
-{
-  T res;
-  collectives_->front().result_out(&res);
-  return res;
-}
-
-template<class T>
-T
-diy::Master::Proxy::
-get() const
-{
-  T res = read<T>();
-  collectives_->pop_front();
-  return res;
-}
-
-template<class T>
-void
-diy::Master::Proxy::
-scratch(const T& in) const
-{
-  collectives_->push_back(Collective(new detail::Scratch<T>(in)));
-}
-
-template<class T>
-void
-diy::Master::Proxy::
-enqueue(const BlockID& to, const T* x, size_t n,
-        void (*save)(BinaryBuffer&, const T&)) const
-{
-    OutgoingQueues& out = *outgoing_;
-    BinaryBuffer&   bb  = out[to];
-    if (save == (void (*)(BinaryBuffer&, const T&)) &::diy::save<T>)
-        diy::save(bb, x, n);       // optimized for unspecialized types
-    else
-        for (size_t i = 0; i < n; ++i)
-            save(bb, x[i]);
-}
-
-template<class T>
-void
-diy::Master::Proxy::
-dequeue(int from, T* x, size_t n,
-        void (*load)(BinaryBuffer&, T&)) const
-{
-    IncomingQueues& in = *incoming_;
-    BinaryBuffer&   bb = in[from];
-    if (load == (void (*)(BinaryBuffer&, T&)) &::diy::load<T>)
-        diy::load(bb, x, n);       // optimized for unspecialized types
-    else
-        for (size_t i = 0; i < n; ++i)
-            load(bb, x[i]);
-}
-
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/reduce-operations.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/reduce-operations.hpp
@ -1,33 +0,0 @@
-#ifndef VTKMDIY_REDUCE_OPERATIONS_HPP
-#define VTKMDIY_REDUCE_OPERATIONS_HPP
-
-#include "reduce.hpp"
-#include "partners/swap.hpp"
-#include "detail/reduce/all-to-all.hpp"
-
-namespace diy
-{
-
-/**
- * \ingroup Communication
- * \brief all to all reduction
- *
- */
-template<class Op>
-void
-all_to_all(Master&              master,     //!< block owner
-           const Assigner&      assigner,   //!< global block locator (maps gid to proc)
-           const Op&            op,         //!< user-defined operation called to enqueue and dequeue items
-           int                  k = 2       //!< reduction fanout
-          )
-{
-  auto scoped = master.prof.scoped("all_to_all");
-  (void)scoped;
-  RegularDecomposer<DiscreteBounds> decomposer(1, interval(0,assigner.nblocks()-1), assigner.nblocks());
-  RegularSwapPartners  partners(decomposer, k, false);
-  reduce(master, assigner, partners, detail::AllToAllReduce<Op>(op, assigner), detail::SkipIntermediate(partners.rounds()));
-}
-
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/reduce.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/reduce.hpp
@ -1,216 +0,0 @@
-#ifndef VTKMDIY_REDUCE_HPP
-#define VTKMDIY_REDUCE_HPP
-
-#include <vector>
-#include "master.hpp"
-#include "assigner.hpp"
-#include "detail/block_traits.hpp"
-#include "log.hpp"
-
-namespace diy
-{
-//! Enables communication within a group during a reduction.
-//! DIY creates the ReduceProxy for you in diy::reduce()
-//! and provides a reference to ReduceProxy each time the user's reduction function is called
-struct ReduceProxy: public Master::Proxy
-{
-    typedef     std::vector<int>                            GIDVector;
-
-    ReduceProxy(const Master::Proxy&    proxy, //!< parent proxy
-                void*                   block, //!< diy block
-                unsigned                round, //!< current round
-                const Assigner&         assigner, //!< assigner
-                const GIDVector&        incoming_gids, //!< incoming gids in this group
-                const GIDVector&        outgoing_gids): //!< outgoing gids in this group
-      Master::Proxy(proxy),
-      block_(block),
-      round_(round),
-      assigner_(assigner)
-    {
-      // setup in_link
-      for (unsigned i = 0; i < incoming_gids.size(); ++i)
-      {
-        BlockID nbr;
-        nbr.gid  = incoming_gids[i];
-        nbr.proc = assigner.rank(nbr.gid);
-        in_link_.add_neighbor(nbr);
-      }
-
-      // setup out_link
-      for (unsigned i = 0; i < outgoing_gids.size(); ++i)
-      {
-        BlockID nbr;
-        nbr.gid  = outgoing_gids[i];
-        nbr.proc = assigner.rank(nbr.gid);
-        out_link_.add_neighbor(nbr);
-      }
-    }
-
-    ReduceProxy(const Master::Proxy&    proxy, //!< parent proxy
-                void*                   block, //!< diy block
-                unsigned                round, //!< current round
-                const Assigner&         assigner,
-                const Link&             in_link,
-                const Link&             out_link):
-      Master::Proxy(proxy),
-      block_(block),
-      round_(round),
-      assigner_(assigner),
-      in_link_(in_link),
-      out_link_(out_link)
-    {}
-
-    //! returns pointer to block
-    void*         block() const                           { return block_; }
-    //! returns current round number
-    unsigned      round() const                           { return round_; }
-    //! returns incoming link
-    const Link&   in_link() const                         { return in_link_; }
-    //! returns outgoing link
-    const Link&   out_link() const                        { return out_link_; }
-    //! returns total number of blocks
-    int           nblocks() const                         { return assigner_.nblocks(); }
-    //! returns the assigner
-    const Assigner& assigner() const                      { return assigner_; }
-
-    //! advanced: change current round number
-    void          set_round(unsigned r)                   { round_ = r; }
-
-  private:
-    void*         block_;
-    unsigned      round_;
-    const Assigner& assigner_;
-
-    Link          in_link_;
-    Link          out_link_;
-};
-
-namespace detail
-{
-  template<class Block, class Partners>
-  struct ReductionFunctor;
-
-  template<class Partners, class Skip>
-  struct SkipInactiveOr;
-
-  struct ReduceNeverSkip
-  {
-    bool operator()(int, int, const Master&) const  { return false; }
-  };
-}
-
-/**
- * \ingroup Communication
- * \brief Implementation of the reduce communication pattern (includes
- *        swap-reduce, merge-reduce, and any other global communication).
- *
- */
-template<class Reduce, class Partners, class Skip>
-void reduce(Master&                    master,        //!< master object
-            const Assigner&            assigner,      //!< assigner object
-            const Partners&            partners,      //!< partners object
-            const Reduce&              reduce,        //!< reduction callback function
-            const Skip&                skip)          //!< object determining whether a block should be skipped
-{
-  auto log = get_logger();
-
-  int original_expected = master.expected();
-
-  using Block = typename detail::block_traits<Reduce>::type;
-
-  unsigned round;
-  for (round = 0; round < partners.rounds(); ++round)
-  {
-    log->debug("Round {}", round);
-    master.foreach(detail::ReductionFunctor<Block,Partners>(round, reduce, partners, assigner),
-                   detail::SkipInactiveOr<Partners,Skip>(round, partners, skip));
-    master.execute();
-
-    int expected = 0;
-    for (unsigned i = 0; i < master.size(); ++i)
-    {
-      if (partners.active(round + 1, master.gid(i), master))
-      {
-        std::vector<int> incoming_gids;
-        partners.incoming(round + 1, master.gid(i), incoming_gids, master);
-        expected += static_cast<int>(incoming_gids.size());
-        master.incoming(master.gid(i)).clear();
-      }
-    }
-    master.set_expected(expected);
-    master.flush();
-  }
-  // final round
-  log->debug("Round {}", round);
-  master.foreach(detail::ReductionFunctor<Block,Partners>(round, reduce, partners, assigner),
-                 detail::SkipInactiveOr<Partners,Skip>(round, partners, skip));
-
-  master.set_expected(original_expected);
-}
-
-/**
- * \ingroup Communication
- * \brief Implementation of the reduce communication pattern (includes
- *        swap-reduce, merge-reduce, and any other global communication).
- *
- */
-template<class Reduce, class Partners>
-void reduce(Master&                    master,        //!< master object
-            const Assigner&            assigner,      //!< assigner object
-            const Partners&            partners,      //!< partners object
-            const Reduce&              reducer)       //!< reduction callback function
-{
-  reduce(master, assigner, partners, reducer, detail::ReduceNeverSkip());
-}
-
-namespace detail
-{
-  template<class Block, class Partners>
-  struct ReductionFunctor
-  {
-    using Callback = std::function<void(Block*, const ReduceProxy&, const Partners&)>;
-
-                ReductionFunctor(unsigned round_, const Callback& reduce_, const Partners& partners_, const Assigner& assigner_):
-                    round(round_), reduce(reduce_), partners(partners_), assigner(assigner_)        {}
-
-    void        operator()(Block* b, const Master::ProxyWithLink& cp) const
-    {
-      if (!partners.active(round, cp.gid(), *cp.master())) return;
-
-      std::vector<int> incoming_gids, outgoing_gids;
-      if (round > 0)
-          partners.incoming(round, cp.gid(), incoming_gids, *cp.master());        // receive from the previous round
-      if (round < partners.rounds())
-          partners.outgoing(round, cp.gid(), outgoing_gids, *cp.master());        // send to the next round
-
-      ReduceProxy   rp(cp, b, round, assigner, incoming_gids, outgoing_gids);
-      reduce(b, rp, partners);
-
-      // touch the outgoing queues to make sure they exist
-      Master::OutgoingQueues& outgoing = *cp.outgoing();
-      if (outgoing.size() < (size_t) rp.out_link().size())
-        for (int j = 0; j < rp.out_link().size(); ++j)
-          outgoing[rp.out_link().target(j)];       // touch the outgoing queue, creating it if necessary
-    }
-
-    unsigned        round;
-    Callback        reduce;
-    Partners        partners;
-    const Assigner& assigner;
-  };
-
-  template<class Partners, class Skip>
-  struct SkipInactiveOr
-  {
-                    SkipInactiveOr(int round_, const Partners& partners_, const Skip& skip_):
-                        round(round_), partners(partners_), skip(skip_)         {}
-    bool            operator()(int i, const Master& master) const               { return !partners.active(round, master.gid(i), master) || skip(round, i, master); }
-    int             round;
-    const Partners& partners;
-    Skip            skip;
-  };
-}
-
-} // diy
-
-#endif // DIY_REDUCE_HPP
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/resolve.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/resolve.hpp
@ -1,72 +0,0 @@
-#ifndef VTKMDIY_RESOLVE_HPP
-#define VTKMDIY_RESOLVE_HPP
-
-#include "master.hpp"
-#include "assigner.hpp"
-
-namespace diy
-{
-    // record master gids in assigner and then lookup the procs for all gids in the links
-    inline void    fix_links(diy::Master& master, diy::DynamicAssigner& assigner);
-
-    // auxiliary functions; could stick them into detail namespace, but they might be useful on their own
-    inline void    record_local_gids(const diy::Master& master, diy::DynamicAssigner& assigner);
-    inline void    update_links(diy::Master& master, const diy::DynamicAssigner& assigner);
-}
-
-void
-diy::
-record_local_gids(const diy::Master& master, diy::DynamicAssigner& assigner)
-{
-    // figure out local ranks
-    std::vector<std::tuple<int,int>> local_gids;
-    for (int i = 0; i < master.size(); ++i)
-        local_gids.emplace_back(std::make_tuple(master.communicator().rank(), master.gid(i)));
-
-    assigner.set_ranks(local_gids);
-}
-
-void
-diy::
-update_links(diy::Master& master, const diy::DynamicAssigner& assigner)
-{
-    // figure out all the gids we need
-    std::vector<int> nbr_gids;
-    for (int i = 0; i < master.size(); ++i)
-    {
-        auto* link = master.link(i);
-        for (auto blockid : link->neighbors())
-            nbr_gids.emplace_back(blockid.gid);
-    }
-
-    // keep only unique gids to avoid unnecessary lookups
-    std::sort(nbr_gids.begin(), nbr_gids.end());
-    nbr_gids.resize(std::unique(nbr_gids.begin(), nbr_gids.end()) - nbr_gids.begin());
-
-    // get the neighbor ranks
-    std::vector<int> nbr_procs = assigner.ranks(nbr_gids);
-
-    // convert to a map
-    std::unordered_map<int, int>    gid_to_proc;
-    for (size_t i = 0; i < nbr_gids.size(); ++i)
-        gid_to_proc[nbr_gids[i]] = nbr_procs[i];
-
-    // fix the procs in links
-    for (int i = 0; i < master.size(); ++i)
-    {
-        auto* link = master.link(i);
-        for (auto& blockid : link->neighbors())
-            blockid.proc = gid_to_proc[blockid.gid];
-    }
-}
-
-void
-diy::
-fix_links(diy::Master& master, diy::DynamicAssigner& assigner)
-{
-    record_local_gids(master, assigner);
-    master.communicator().barrier();        // make sure everyone has set ranks
-    update_links(master, assigner);
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/serialization.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/serialization.hpp
@ -1,509 +0,0 @@
-#ifndef VTKMDIY_SERIALIZATION_HPP
-#define VTKMDIY_SERIALIZATION_HPP
-
-#include <vector>
-#include <valarray>
-#include <map>
-#include <set>
-#include <string>
-#include <fstream>
-
-#include <tuple>
-#include <unordered_map>
-#include <unordered_set>
-#include <type_traits>              // this is used for a safety check for default serialization
-
-namespace diy
-{
-  //! A serialization buffer. \ingroup Serialization
-  struct BinaryBuffer
-  {
-    virtual ~BinaryBuffer()                                         =default;
-    virtual void        save_binary(const char* x, size_t count)    =0;   //!< copy `count` bytes from `x` into the buffer
-    virtual inline void append_binary(const char* x, size_t count)  =0;   //!< append `count` bytes from `x` to end of buffer
-    virtual void        load_binary(char* x, size_t count)          =0;   //!< copy `count` bytes into `x` from the buffer
-    virtual void        load_binary_back(char* x, size_t count)     =0;   //!< copy `count` bytes into `x` from the back of the buffer
-  };
-
-  struct MemoryBuffer: public BinaryBuffer
-  {
-                        MemoryBuffer(size_t position_ = 0):
-                          position(position_)                       {}
-
-    virtual inline void save_binary(const char* x, size_t count) override;   //!< copy `count` bytes from `x` into the buffer
-    virtual inline void append_binary(const char* x, size_t count) override; //!< append `count` bytes from `x` to end of buffer
-    virtual inline void load_binary(char* x, size_t count) override;         //!< copy `count` bytes into `x` from the buffer
-    virtual inline void load_binary_back(char* x, size_t count) override;    //!< copy `count` bytes into `x` from the back of the buffer
-
-    void                clear()                                     { buffer.clear(); reset(); }
-    void                wipe()                                      { std::vector<char>().swap(buffer); reset(); }
-    void                reset()                                     { position = 0; }
-    void                skip(size_t s)                              { position += s; }
-    void                swap(MemoryBuffer& o)                       { std::swap(position, o.position); buffer.swap(o.buffer); }
-    bool                empty() const                               { return buffer.empty(); }
-    size_t              size() const                                { return buffer.size(); }
-    void                reserve(size_t s)                           { buffer.reserve(s); }
-                        operator bool() const                       { return position < buffer.size(); }
-
-    //! copy a memory buffer from one buffer to another, bypassing making a temporary copy first
-    inline static void  copy(MemoryBuffer& from, MemoryBuffer& to);
-
-    //! multiplier used for the geometric growth of the container
-    static float        growth_multiplier()                         { return 1.5; }
-
-    // simple file IO
-    void                write(const std::string& fn) const          { std::ofstream out(fn.c_str()); out.write(&buffer[0], size()); }
-    void                read(const std::string& fn)
-    {
-        std::ifstream in(fn.c_str(), std::ios::binary | std::ios::ate);
-        buffer.resize(static_cast<size_t>(in.tellg()));
-        in.seekg(0);
-        in.read(&buffer[0], static_cast<std::streamsize>(size()));
-        position = 0;
-    }
-
-    size_t              position;
-    std::vector<char>   buffer;
-  };
-
-  namespace detail
-  {
-    struct Default {};
-  }
-
-  //!\addtogroup Serialization
-  //!@{
-
-  /**
-   * \brief Main interface to serialization, meant to be specialized for the
-   * types that require special handling.  `diy::save()` and `diy::load()` call
-   * the static member functions of this class.
-   *
-   * The default (unspecialized) version copies
-   * `sizeof(T)` bytes from `&x` to or from `bb` via
-   * its `diy::BinaryBuffer::save_binary()` and `diy::BinaryBuffer::load_binary()`
-   * functions.  This works out perfectly for plain old data (e.g., simple structs).
-   * To save a more complicated type, one has to specialize
-   * `diy::Serialization<T>` for that type. Specializations are already provided for
-   * `std::vector<T>`, `std::map<K,V>`, and `std::pair<T,U>`.
-   * As a result one can quickly add a specialization of one's own
-   *
-   */
-  template<class T>
-  struct Serialization: public detail::Default
-  {
-#if (defined(__clang__) && !defined(__ppc64__)) || (defined(__GNUC__) && __GNUC__ >= 5)
-    //exempt power-pc clang variants due to: https://gitlab.kitware.com/vtk/vtk-m/issues/201
-    static_assert(std::is_trivially_copyable<T>::value, "Default serialization works only for trivially copyable types");
-#endif
-
-    static void         save(BinaryBuffer& bb, const T& x)          { bb.save_binary((const char*)  &x, sizeof(T)); }
-    static void         load(BinaryBuffer& bb, T& x)                { bb.load_binary((char*)        &x, sizeof(T)); }
-    static size_t       size(const T& x)                            { return sizeof(T); }
-  };
-
-  //! Saves `x` to `bb` by calling `diy::Serialization<T>::save(bb,x)`.
-  template<class T>
-  void                  save(BinaryBuffer& bb, const T& x)          { Serialization<T>::save(bb, x); }
-
-  //! Loads `x` from `bb` by calling `diy::Serialization<T>::load(bb,x)`.
-  template<class T>
-  void                  load(BinaryBuffer& bb, T& x)                { Serialization<T>::load(bb, x); }
-
-  //! Optimization for arrays. If `diy::Serialization` is not specialized for `T`,
-  //! the array will be copied all at once. Otherwise, it's copied element by element.
-  template<class T>
-  void                  save(BinaryBuffer& bb, const T* x, size_t n);
-
-  //! Optimization for arrays. If `diy::Serialization` is not specialized for `T`,
-  //! the array will be filled all at once. Otherwise, it's filled element by element.
-  template<class T>
-  void                  load(BinaryBuffer& bb, T* x, size_t n);
-
-  //! Supports only binary data copying (meant for simple footers).
-  template<class T>
-  void                  load_back(BinaryBuffer& bb, T& x)           { bb.load_binary_back((char*) &x, sizeof(T)); }
-
-  //@}
-
-
-  namespace detail
-  {
-    template<typename T>
-    struct is_default
-    {
-        typedef char    yes;
-        typedef int     no;
-
-        static yes      test(Default*);
-        static no       test(...);
-
-        enum { value = (sizeof(test((T*) 0)) == sizeof(yes)) };
-    };
-  }
-
-  template<class T>
-  void                  save(BinaryBuffer& bb, const T* x, size_t n)
-  {
-    if (!detail::is_default< Serialization<T> >::value)
-      for (size_t i = 0; i < n; ++i)
-        diy::save(bb, x[i]);
-    else        // if Serialization is not specialized for U, just save the binary data
-      bb.save_binary((const char*) &x[0], sizeof(T)*n);
-  }
-
-  template<class T>
-  void                  load(BinaryBuffer& bb, T* x, size_t n)
-  {
-    if (!detail::is_default< Serialization<T> >::value)
-      for (size_t i = 0; i < n; ++i)
-        diy::load(bb, x[i]);
-    else      // if Serialization is not specialized for U, just load the binary data
-      bb.load_binary((char*) &x[0], sizeof(T)*n);
-  }
-
-
-  // save/load for MemoryBuffer
-  template<>
-  struct Serialization< MemoryBuffer >
-  {
-    static void         save(BinaryBuffer& bb, const MemoryBuffer& x)
-    {
-      diy::save(bb, x.position);
-      diy::save(bb, &x.buffer[0], x.position);
-    }
-
-    static void         load(BinaryBuffer& bb, MemoryBuffer& x)
-    {
-      diy::load(bb, x.position);
-      x.buffer.resize(x.position);
-      diy::load(bb, &x.buffer[0], x.position);
-    }
-
-    static size_t       size(const MemoryBuffer& x)
-    {
-        return sizeof(x.position) + x.position;
-    }
-  };
-
-  // save/load for std::vector<U>
-  template<class U>
-  struct Serialization< std::vector<U> >
-  {
-    typedef             std::vector<U>          Vector;
-
-    static void         save(BinaryBuffer& bb, const Vector& v)
-    {
-      size_t s = v.size();
-      diy::save(bb, s);
-      if (s > 0)
-        diy::save(bb, &v[0], v.size());
-    }
-
-    static void         load(BinaryBuffer& bb, Vector& v)
-    {
-      size_t s;
-      diy::load(bb, s);
-      v.resize(s);
-      if (s > 0)
-        diy::load(bb, &v[0], s);
-    }
-  };
-
-  template<class U>
-  struct Serialization< std::valarray<U> >
-  {
-    typedef             std::valarray<U>        ValArray;
-
-    static void         save(BinaryBuffer& bb, const ValArray& v)
-    {
-      size_t s = v.size();
-      diy::save(bb, s);
-      if (s > 0)
-        diy::save(bb, &v[0], v.size());
-    }
-
-    static void         load(BinaryBuffer& bb, ValArray& v)
-    {
-      size_t s;
-      diy::load(bb, s);
-      v.resize(s);
-      if (s > 0)
-        diy::load(bb, &v[0], s);
-    }
-  };
-
-  // save/load for std::string
-  template<>
-  struct Serialization< std::string >
-  {
-    typedef             std::string             String;
-
-    static void         save(BinaryBuffer& bb, const String& s)
-    {
-      size_t sz = s.size();
-      diy::save(bb, sz);
-      diy::save(bb, s.c_str(), sz);
-    }
-
-    static void         load(BinaryBuffer& bb, String& s)
-    {
-      size_t sz;
-      diy::load(bb, sz);
-      s.resize(sz);
-      for (size_t i = 0; i < sz; ++i)
-      {
-          char c;
-          diy::load(bb, c);
-          s[i] = c;
-      }
-    }
-  };
-
-  // save/load for std::pair<X,Y>
-  template<class X, class Y>
-  struct Serialization< std::pair<X,Y> >
-  {
-    typedef             std::pair<X,Y>          Pair;
-
-    static void         save(BinaryBuffer& bb, const Pair& p)
-    {
-      diy::save(bb, p.first);
-      diy::save(bb, p.second);
-    }
-
-    static void         load(BinaryBuffer& bb, Pair& p)
-    {
-      diy::load(bb, p.first);
-      diy::load(bb, p.second);
-    }
-  };
-
-  // save/load for std::map<K,V>
-  template<class K, class V>
-  struct Serialization< std::map<K,V> >
-  {
-    typedef             std::map<K,V>           Map;
-
-    static void         save(BinaryBuffer& bb, const Map& m)
-    {
-      size_t s = m.size();
-      diy::save(bb, s);
-      for (typename std::map<K,V>::const_iterator it = m.begin(); it != m.end(); ++it)
-        diy::save(bb, *it);
-    }
-
-    static void         load(BinaryBuffer& bb, Map& m)
-    {
-      size_t s;
-      diy::load(bb, s);
-      for (size_t i = 0; i < s; ++i)
-      {
-        K k;
-        diy::load(bb, k);
-        diy::load(bb, m[k]);
-      }
-    }
-  };
-
-  // save/load for std::set<T>
-  template<class T>
-  struct Serialization< std::set<T> >
-  {
-    typedef             std::set<T>             Set;
-
-    static void         save(BinaryBuffer& bb, const Set& m)
-    {
-      size_t s = m.size();
-      diy::save(bb, s);
-      for (typename std::set<T>::const_iterator it = m.begin(); it != m.end(); ++it)
-        diy::save(bb, *it);
-    }
-
-    static void         load(BinaryBuffer& bb, Set& m)
-    {
-      size_t s;
-      diy::load(bb, s);
-      for (size_t i = 0; i < s; ++i)
-      {
-        T p;
-        diy::load(bb, p);
-        m.insert(p);
-      }
-    }
-  };
-
-  // save/load for std::unordered_map<K,V,H,E,A>
-  template<class K, class V, class H, class E, class A>
-  struct Serialization< std::unordered_map<K,V,H,E,A> >
-  {
-    typedef             std::unordered_map<K,V,H,E,A>   Map;
-
-    static void         save(BinaryBuffer& bb, const Map& m)
-    {
-      size_t s = m.size();
-      diy::save(bb, s);
-      for (auto& x : m)
-        diy::save(bb, x);
-    }
-
-    static void         load(BinaryBuffer& bb, Map& m)
-    {
-      size_t s;
-      diy::load(bb, s);
-      for (size_t i = 0; i < s; ++i)
-      {
-        std::pair<K,V> p;
-        diy::load(bb, p);
-        m.emplace(std::move(p));
-      }
-    }
-  };
-
-  // save/load for std::unordered_set<T,H,E,A>
-  template<class T, class H, class E, class A>
-  struct Serialization< std::unordered_set<T,H,E,A> >
-  {
-    typedef             std::unordered_set<T,H,E,A>     Set;
-
-    static void         save(BinaryBuffer& bb, const Set& m)
-    {
-      size_t s = m.size();
-      diy::save(bb, s);
-      for (auto& x : m)
-        diy::save(bb, x);
-    }
-
-    static void         load(BinaryBuffer& bb, Set& m)
-    {
-      size_t s;
-      diy::load(bb, s);
-      for (size_t i = 0; i < s; ++i)
-      {
-        T p;
-        diy::load(bb, p);
-        m.emplace(std::move(p));
-      }
-    }
-  };
-
-  // save/load for std::tuple<...>
-  // TODO: this ought to be default (copying) serialization
-  //       if all arguments are default
-  template<class... Args>
-  struct Serialization< std::tuple<Args...> >
-  {
-    typedef             std::tuple<Args...>     Tuple;
-
-    static void         save(BinaryBuffer& bb, const Tuple& t)          { save<0>(bb, t); }
-
-    template<std::size_t I = 0>
-    static
-    typename std::enable_if<I == sizeof...(Args), void>::type
-                        save(BinaryBuffer&, const Tuple&)               {}
-
-    template<std::size_t I = 0>
-    static
-    typename std::enable_if<I < sizeof...(Args), void>::type
-                        save(BinaryBuffer& bb, const Tuple& t)          { diy::save(bb, std::get<I>(t)); save<I+1>(bb, t); }
-
-    static void         load(BinaryBuffer& bb, Tuple& t)                { load<0>(bb, t); }
-
-    template<std::size_t I = 0>
-    static
-    typename std::enable_if<I == sizeof...(Args), void>::type
-                        load(BinaryBuffer&, Tuple&)                     {}
-
-    template<std::size_t I = 0>
-    static
-    typename std::enable_if<I < sizeof...(Args), void>::type
-                        load(BinaryBuffer& bb, Tuple& t)                { diy::load(bb, std::get<I>(t)); load<I+1>(bb, t); }
-
-  };
-}
-
-void
-diy::MemoryBuffer::
-save_binary(const char* x, size_t count)
-{
-  if (position + count > buffer.capacity())
-  {
-    double newsize = static_cast<double>(position + count) * growth_multiplier();  // if we have to grow, grow geometrically
-    buffer.reserve(static_cast<size_t>(newsize));
-  }
-
-  if (position + count > buffer.size())
-    buffer.resize(position + count);
-
-  std::copy_n(x, count, &buffer[position]);
-  position += count;
-}
-
-void
-diy::MemoryBuffer::
-append_binary(const char* x, size_t count)
-{
-    if (buffer.size() + count > buffer.capacity())     // growth/copying will be triggered
-    {
-        size_t cur_size = buffer.size() - position;
-        size_t new_size = cur_size + count;
-        if (new_size * growth_multiplier() <= buffer.capacity())        // we have enough space in this buffer, copy in place
-        {
-            // copy the data to the beginning of the buffer and reduce its size
-            for (size_t i = 0; i < cur_size; ++i)
-                buffer[i] = buffer[position++];
-
-            buffer.resize(cur_size);
-            position = 0;
-        } else
-        {
-            std::vector<char> tmp;
-            tmp.reserve(new_size * static_cast<size_t>(growth_multiplier()));
-            tmp.resize(cur_size);
-
-            for (size_t i = 0; i < tmp.size(); ++i)
-                tmp[i] = buffer[position++];
-
-            buffer.swap(tmp);
-            position = 0;
-        }
-    }
-
-    size_t temp_pos = position;
-    position = size();
-    save_binary(x, count);
-    position = temp_pos;
-}
-
-void
-diy::MemoryBuffer::
-load_binary(char* x, size_t count)
-{
-  std::copy_n(&buffer[position], count, x);
-  position += count;
-}
-
-void
-diy::MemoryBuffer::
-load_binary_back(char* x, size_t count)
-{
-  std::copy_n(&buffer[buffer.size() - count], count, x);
-  buffer.resize(buffer.size() - count);
-}
-
-void
-diy::MemoryBuffer::
-copy(MemoryBuffer& from, MemoryBuffer& to)
-{
-  size_t sz;
-  diy::load(from, sz);
-  from.position -= sizeof(size_t);
-
-  size_t total = sizeof(size_t) + sz;
-  to.buffer.resize(to.position + total);
-  std::copy_n(&from.buffer[from.position], total, &to.buffer[to.position]);
-  to.position += total;
-  from.position += total;
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/stats.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/stats.hpp
@ -1,119 +0,0 @@
-#ifndef VTKMDIY_STATS_HPP
-#define VTKMDIY_STATS_HPP
-
-#include <chrono>
-#include <string>
-#include <vector>
-
-#include "log.hpp"      // need this for format
-
-namespace diy
-{
-namespace stats
-{
-
-#if defined(DIY_PROFILE)
-struct Profiler
-{
-    using   Clock = std::chrono::high_resolution_clock;
-    using   Time  = Clock::time_point;
-
-    struct Event
-    {
-            Event(const std::string& name_, bool begin_):
-                name(name_),
-                begin(begin_),
-                stamp(Clock::now())
-                                                        {}
-
-        std::string     name;
-        bool            begin;
-        Time            stamp;
-    };
-
-    using   EventsVector = std::vector<Event>;
-
-    struct  Scoped
-    {
-            Scoped(Profiler& prof_, std::string name_):
-                prof(prof_), name(name_), active(true)  { prof << name; }
-            ~Scoped()                                   { if (active) prof >> name; }
-
-            Scoped(Scoped&& other):
-                prof(other.prof),
-                name(other.name),
-                active(other.active)                    { other.active = false; }
-
-        Scoped&
-            operator=(Scoped&& other) = delete;
-            Scoped(const Scoped&) = delete;
-        Scoped&
-            operator=(const Scoped&) = delete;
-
-        Profiler&   prof;
-        std::string name;
-        bool        active;
-    };
-
-            Profiler()                                  { reset_time(); }
-
-    void    reset_time()                                { start = Clock::now(); }
-
-    void    operator<<(std::string name)                { enter(name); }
-    void    operator>>(std::string name)                { exit(name); }
-
-    void    enter(std::string name)                     { events.push_back(Event(name, true)); }
-    void    exit(std::string name)                      { events.push_back(Event(name, false)); }
-
-    void    output(std::ostream& out, std::string prefix = "")
-    {
-        if (!prefix.empty())
-            prefix += " ";
-
-        for (size_t i = 0; i < events.size(); ++i)
-        {
-            const Event& e = events[i];
-            auto time = std::chrono::duration_cast<std::chrono::microseconds>(e.stamp - start).count();
-
-            fmt::print(out, "{}{:02d}:{:02d}:{:02d}.{:06d} {}{}\n",
-                            prefix,
-                            time/1000000/60/60,
-                            time/1000000/60 % 60,
-                            time/1000000 % 60,
-                            time % 1000000,
-                            (e.begin ? '<' : '>'),
-                            e.name);
-        }
-    }
-
-    Scoped  scoped(std::string name)                    { return Scoped(*this, name); }
-
-    void    clear()                                     { events.clear(); }
-
-    private:
-        Time            start;
-        EventsVector    events;
-};
-#else
-struct Profiler
-{
-    struct Scoped {};
-
-    void    reset_time()                                {}
-
-    void    operator<<(std::string)                     {}
-    void    operator>>(std::string)                     {}
-
-    void    enter(const std::string&)                   {}
-    void    exit(const std::string&)                    {}
-
-    void    output(std::ostream&, std::string = "")     {}
-    void    clear()                                     {}
-
-    Scoped  scoped(std::string)                         { return Scoped(); }
-};
-#endif
-}
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/storage.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/storage.hpp
@ -1,246 +0,0 @@
-#ifndef VTKMDIY_STORAGE_HPP
-#define VTKMDIY_STORAGE_HPP
-
-#include <string>
-#include <map>
-#include <fstream>
-#include <fcntl.h>
-
-#include "serialization.hpp"
-#include "thread.hpp"
-#include "log.hpp"
-#include "io/utils.hpp"
-
-namespace diy
-{
-  namespace detail
-  {
-    typedef       void  (*Save)(const void*, BinaryBuffer& buf);
-    typedef       void  (*Load)(void*,       BinaryBuffer& buf);
-
-    struct FileBuffer: public BinaryBuffer
-    {
-                          FileBuffer(FILE* file_): file(file_), head(0), tail(0)    {}
-
-      // TODO: add error checking
-      virtual inline void save_binary(const char* x, size_t count) override   { fwrite(x, 1, count, file); head += count; }
-      virtual inline void append_binary(const char* x, size_t count) override
-      {
-          size_t temp_pos = ftell(file);
-          fseek(file, static_cast<long>(tail), SEEK_END);
-          fwrite(x, 1, count, file);
-          tail += count;
-          fseek(file, temp_pos, SEEK_SET);
-      }
-      virtual inline void load_binary(char* x, size_t count) override         { auto n = fread(x, 1, count, file); (void) n;}
-      virtual inline void load_binary_back(char* x, size_t count) override    { fseek(file, static_cast<long>(tail), SEEK_END); auto n = fread(x, 1, count, file); tail += count; fseek(file, static_cast<long>(head), SEEK_SET); (void) n;}
-
-      size_t              size() const                                { return head; }
-
-      FILE*  file;
-      size_t head, tail;  // tail is used to support reading from the back;
-                          // the mechanism is a little awkward and unused, but should work if needed
-    };
-  }
-
-  class ExternalStorage
-  {
-    public:
-      virtual int   put(MemoryBuffer& bb)                               =0;
-      virtual int   put(const void* x, detail::Save save)               =0;
-      virtual void  get(int i, MemoryBuffer& bb, size_t extra = 0)      =0;
-      virtual void  get(int i, void* x, detail::Load load)              =0;
-      virtual void  destroy(int i)                                      =0;
-  };
-
-  class FileStorage: public ExternalStorage
-  {
-    private:
-      struct FileRecord
-      {
-        size_t          size;
-        std::string     name;
-      };
-
-    public:
-                    FileStorage(const std::string& filename_template = "/tmp/DIY.XXXXXX"):
-                      filename_templates_(1, filename_template),
-                      count_(0), current_size_(0), max_size_(0)         {}
-
-                    FileStorage(const std::vector<std::string>& filename_templates):
-                      filename_templates_(filename_templates),
-                      count_(0), current_size_(0), max_size_(0)         {}
-
-      virtual int   put(MemoryBuffer& bb) override
-      {
-        auto log = get_logger();
-        std::string     filename;
-        int fh = open_random(filename);
-
-        log->debug("FileStorage::put(): {}; buffer size: {}", filename, bb.size());
-
-        size_t sz = bb.buffer.size();
-#if defined(_WIN32)
-        using r_type = int;
-        r_type written = _write(fh, &bb.buffer[0], static_cast<unsigned int>(sz));
-#else
-        using r_type = ssize_t;
-        r_type written = write(fh, &bb.buffer[0], sz);
-#endif
-        if (written < static_cast<r_type>(sz) || written == r_type(-1))
-          log->warn("Could not write the full buffer to {}: written = {}; size = {}", filename, written, sz);
-        io::utils::close(fh);
-        bb.wipe();
-
-#if 0       // double-check the written file size: only for extreme debugging
-        FILE* fp = fopen(filename.c_str(), "r");
-        fseek(fp, 0L, SEEK_END);
-        int fsz = ftell(fp);
-        if (fsz != sz)
-            log->warn("file size doesn't match the buffer size, {} vs {}", fsz, sz);
-        fclose(fp);
-#endif
-
-        return make_file_record(filename, sz);
-      }
-
-      virtual int    put(const void* x, detail::Save save) override
-      {
-        std::string     filename;
-        int fh = open_random(filename);
-#if defined(_WIN32)
-        detail::FileBuffer fb(_fdopen(fh, "wb"));
-#else
-        detail::FileBuffer fb(fdopen(fh, "w"));
-#endif
-        save(x, fb);
-        size_t sz = fb.size();
-        fclose(fb.file);
-        io::utils::sync(fh);
-
-        return make_file_record(filename, sz);
-      }
-
-      virtual void   get(int i, MemoryBuffer& bb, size_t extra) override
-      {
-        FileRecord fr = extract_file_record(i);
-
-        get_logger()->debug("FileStorage::get(): {}", fr.name);
-
-        bb.buffer.reserve(fr.size + extra);
-        bb.buffer.resize(fr.size);
-#if defined(_WIN32)
-        int fh = -1;
-        _sopen_s(&fh, fr.name.c_str(), _O_RDONLY | _O_BINARY, _SH_DENYNO, _S_IREAD);
-        _read(fh, &bb.buffer[0], static_cast<unsigned int>(fr.size));
-#else
-        int fh = open(fr.name.c_str(), O_RDONLY | O_SYNC, 0600);
-        auto n = read(fh, &bb.buffer[0], fr.size); (void) n;
-#endif
-        io::utils::close(fh);
-        remove_file(fr);
-      }
-
-      virtual void   get(int i, void* x, detail::Load load) override
-      {
-        FileRecord fr = extract_file_record(i);
-
-        //int fh = open(fr.name.c_str(), O_RDONLY | O_SYNC, 0600);
-#if defined(_WIN32)
-        int fh = -1;
-        _sopen_s(&fh, fr.name.c_str(), _O_RDONLY | _O_BINARY, _SH_DENYNO, _S_IREAD);
-        detail::FileBuffer fb(_fdopen(fh, "rb"));
-#else
-        int fh = open(fr.name.c_str(), O_RDONLY, 0600);
-        detail::FileBuffer fb(fdopen(fh, "r"));
-#endif
-        load(x, fb);
-        fclose(fb.file);
-
-        remove_file(fr);
-      }
-
-      virtual void  destroy(int i) override
-      {
-        FileRecord      fr;
-        {
-          CriticalMapAccessor accessor = filenames_.access();
-          fr = (*accessor)[i];
-          accessor->erase(i);
-        }
-        io::utils::remove(fr.name);
-        (*current_size_.access()) -= fr.size;
-      }
-
-      int           count() const               { return (*count_.const_access()); }
-      size_t        current_size() const        { return (*current_size_.const_access()); }
-      size_t        max_size() const            { return (*max_size_.const_access()); }
-
-                    ~FileStorage()
-      {
-        for (FileRecordMap::const_iterator it =  filenames_.const_access()->begin();
-                                           it != filenames_.const_access()->end();
-                                         ++it)
-        {
-          io::utils::remove(it->second.name);
-        }
-      }
-
-    private:
-      int           open_random(std::string& filename) const
-      {
-        if (filename_templates_.size() == 1)
-            filename = filename_templates_[0].c_str();
-        else
-        {
-            // pick a template at random (very basic load balancing mechanism)
-            filename  = filename_templates_[static_cast<size_t>(std::rand()) % filename_templates_.size()].c_str();
-        }
-        int fh = diy::io::utils::mkstemp(filename);
-        return fh;
-      }
-
-      int           make_file_record(const std::string& filename, size_t sz)
-      {
-        int res = (*count_.access())++;
-        FileRecord  fr = { sz, filename };
-        (*filenames_.access())[res] = fr;
-
-        // keep track of sizes
-        critical_resource<size_t>::accessor     cur = current_size_.access();
-        *cur += sz;
-        critical_resource<size_t>::accessor     max = max_size_.access();
-        if (*cur > *max)
-            *max = *cur;
-
-        return res;
-      }
-
-      FileRecord    extract_file_record(int i)
-      {
-        CriticalMapAccessor accessor = filenames_.access();
-        FileRecord fr = (*accessor)[i];
-        accessor->erase(i);
-        return fr;
-      }
-
-      void          remove_file(const FileRecord& fr)
-      {
-        io::utils::remove(fr.name);
-        (*current_size_.access()) -= fr.size;
-      }
-
-    private:
-      typedef           std::map<int, FileRecord>                   FileRecordMap;
-      typedef           critical_resource<FileRecordMap>            CriticalMap;
-      typedef           CriticalMap::accessor                       CriticalMapAccessor;
-
-    private:
-      std::vector<std::string>      filename_templates_;
-      CriticalMap                   filenames_;
-      critical_resource<int>        count_;
-      critical_resource<size_t>     current_size_, max_size_;
-  };
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/thread.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/thread.hpp
@ -1,31 +0,0 @@
-#ifndef VTKMDIY_THREAD_H
-#define VTKMDIY_THREAD_H
-
-#ifdef VTKM_DIY_NO_THREADS
-#include "no-thread.hpp"
-#else
-
-#include "thread/fast_mutex.h"
-
-#include <thread>
-#include <mutex>
-
-namespace diy
-{
-    using std::thread;
-    using std::mutex;
-    using std::recursive_mutex;
-    namespace this_thread = std::this_thread;
-
-    // TODO: replace with our own implementation using std::atomic_flag
-    using fast_mutex = tthread::fast_mutex;
-
-    template<class Mutex>
-    using lock_guard = std::unique_lock<Mutex>;
-}
-
-#endif
-
-#include "critical-resource.hpp"
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/thread/fast_mutex.h
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/thread/fast_mutex.h
@ -1,248 +0,0 @@
-/* -*- mode: c++; tab-width: 2; indent-tabs-mode: nil; -*-
-Copyright (c) 2010-2012 Marcus Geelnard
-
-This software is provided 'as-is', without any express or implied
-warranty. In no event will the authors be held liable for any damages
-arising from the use of this software.
-
-Permission is granted to anyone to use this software for any purpose,
-including commercial applications, and to alter it and redistribute it
-freely, subject to the following restrictions:
-
-    1. The origin of this software must not be misrepresented; you must not
-    claim that you wrote the original software. If you use this software
-    in a product, an acknowledgment in the product documentation would be
-    appreciated but is not required.
-
-    2. Altered source versions must be plainly marked as such, and must not be
-    misrepresented as being the original software.
-
-    3. This notice may not be removed or altered from any source
-    distribution.
-*/
-
-#ifndef _FAST_MUTEX_H_
-#define _FAST_MUTEX_H_
-
-/// @file
-
-// Which platform are we on?
-#if !defined(_TTHREAD_PLATFORM_DEFINED_)
-  #if defined(_WIN32) || defined(__WIN32__) || defined(__WINDOWS__)
-    #define _TTHREAD_WIN32_
-  #else
-    #define _TTHREAD_POSIX_
-  #endif
-  #define _TTHREAD_PLATFORM_DEFINED_
-#endif
-
-// Check if we can support the assembly language level implementation (otherwise
-// revert to the system API)
-#if (defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))) || \
-    (defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64))) || \
-    (defined(__GNUC__) && (defined(__ppc__)))
-  #define _FAST_MUTEX_ASM_
-#else
-  #define _FAST_MUTEX_SYS_
-#endif
-
-#if defined(_TTHREAD_WIN32_)
-  #ifndef WIN32_LEAN_AND_MEAN
-    #define WIN32_LEAN_AND_MEAN
-    #define __UNDEF_LEAN_AND_MEAN
-  #endif
-  #include <windows.h>
-  #ifdef __UNDEF_LEAN_AND_MEAN
-    #undef WIN32_LEAN_AND_MEAN
-    #undef __UNDEF_LEAN_AND_MEAN
-  #endif
-#else
-  #ifdef _FAST_MUTEX_ASM_
-    #include <sched.h>
-  #else
-    #include <pthread.h>
-  #endif
-#endif
-
-namespace tthread {
-
-/// Fast mutex class.
-/// This is a mutual exclusion object for synchronizing access to shared
-/// memory areas for several threads. It is similar to the tthread::mutex class,
-/// but instead of using system level functions, it is implemented as an atomic
-/// spin lock with very low CPU overhead.
-///
-/// The \c fast_mutex class is NOT compatible with the \c condition_variable
-/// class (however, it IS compatible with the \c lock_guard class). It should
-/// also be noted that the \c fast_mutex class typically does not provide
-/// as accurate thread scheduling as a the standard \c mutex class does.
-///
-/// Because of the limitations of the class, it should only be used in
-/// situations where the mutex needs to be locked/unlocked very frequently.
-///
-/// @note The "fast" version of this class relies on inline assembler language,
-/// which is currently only supported for 32/64-bit Intel x86/AMD64 and
-/// PowerPC architectures on a limited number of compilers (GNU g++ and MS
-/// Visual C++).
-/// For other architectures/compilers, system functions are used instead.
-class fast_mutex {
-  public:
-    /// Constructor.
-#if defined(_FAST_MUTEX_ASM_)
-    fast_mutex() : mLock(0) {}
-#else
-    fast_mutex()
-    {
-  #if defined(_TTHREAD_WIN32_)
-      InitializeCriticalSection(&mHandle);
-  #elif defined(_TTHREAD_POSIX_)
-      pthread_mutex_init(&mHandle, NULL);
-  #endif
-    }
-#endif
-
-#if !defined(_FAST_MUTEX_ASM_)
-    /// Destructor.
-    ~fast_mutex()
-    {
-  #if defined(_TTHREAD_WIN32_)
-      DeleteCriticalSection(&mHandle);
-  #elif defined(_TTHREAD_POSIX_)
-      pthread_mutex_destroy(&mHandle);
-  #endif
-    }
-#endif
-
-    /// Lock the mutex.
-    /// The method will block the calling thread until a lock on the mutex can
-    /// be obtained. The mutex remains locked until \c unlock() is called.
-    /// @see lock_guard
-    inline void lock()
-    {
-#if defined(_FAST_MUTEX_ASM_)
-      bool gotLock;
-      do {
-        gotLock = try_lock();
-        if(!gotLock)
-        {
-  #if defined(_TTHREAD_WIN32_)
-          Sleep(0);
-  #elif defined(_TTHREAD_POSIX_)
-          sched_yield();
-  #endif
-        }
-      } while(!gotLock);
-#else
-  #if defined(_TTHREAD_WIN32_)
-      EnterCriticalSection(&mHandle);
-  #elif defined(_TTHREAD_POSIX_)
-      pthread_mutex_lock(&mHandle);
-  #endif
-#endif
-    }
-
-    /// Try to lock the mutex.
-    /// The method will try to lock the mutex. If it fails, the function will
-    /// return immediately (non-blocking).
-    /// @return \c true if the lock was acquired, or \c false if the lock could
-    /// not be acquired.
-    inline bool try_lock()
-    {
-#if defined(_FAST_MUTEX_ASM_)
-      int oldLock;
-  #if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
-      asm volatile (
-        "movl $1,%%eax\n\t"
-        "xchg %%eax,%0\n\t"
-        "movl %%eax,%1\n\t"
-        : "=m" (mLock), "=m" (oldLock)
-        :
-        : "%eax", "memory"
-      );
-  #elif defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64))
-      int *ptrLock = &mLock;
-      __asm {
-        mov eax,1
-        mov ecx,ptrLock
-        xchg eax,[ecx]
-        mov oldLock,eax
-      }
-  #elif defined(__GNUC__) && (defined(__ppc__))
-      int newLock = 1;
-      asm volatile (
-        "\n1:\n\t"
-        "lwarx  %0,0,%1\n\t"
-        "cmpwi  0,%0,0\n\t"
-        "bne-   2f\n\t"
-        "stwcx. %2,0,%1\n\t"
-        "bne-   1b\n\t"
-        "isync\n"
-        "2:\n\t"
-        : "=&r" (oldLock)
-        : "r" (&mLock), "r" (newLock)
-        : "cr0", "memory"
-      );
-  #endif
-      return (oldLock == 0);
-#else
-  #if defined(_TTHREAD_WIN32_)
-      return TryEnterCriticalSection(&mHandle) ? true : false;
-  #elif defined(_TTHREAD_POSIX_)
-      return (pthread_mutex_trylock(&mHandle) == 0) ? true : false;
-  #endif
-#endif
-    }
-
-    /// Unlock the mutex.
-    /// If any threads are waiting for the lock on this mutex, one of them will
-    /// be unblocked.
-    inline void unlock()
-    {
-#if defined(_FAST_MUTEX_ASM_)
-  #if defined(__GNUC__) && (defined(__i386__) || defined(__x86_64__))
-      asm volatile (
-        "movl $0,%%eax\n\t"
-        "xchg %%eax,%0\n\t"
-        : "=m" (mLock)
-        :
-        : "%eax", "memory"
-      );
-  #elif defined(_MSC_VER) && (defined(_M_IX86) || defined(_M_X64))
-      int *ptrLock = &mLock;
-      __asm {
-        mov eax,0
-        mov ecx,ptrLock
-        xchg eax,[ecx]
-      }
-  #elif defined(__GNUC__) && (defined(__ppc__))
-      asm volatile (
-        "sync\n\t"  // Replace with lwsync where possible?
-        : : : "memory"
-      );
-      mLock = 0;
-  #endif
-#else
-  #if defined(_TTHREAD_WIN32_)
-      LeaveCriticalSection(&mHandle);
-  #elif defined(_TTHREAD_POSIX_)
-      pthread_mutex_unlock(&mHandle);
-  #endif
-#endif
-    }
-
-  private:
-#if defined(_FAST_MUTEX_ASM_)
-    int mLock;
-#else
-  #if defined(_TTHREAD_WIN32_)
-    CRITICAL_SECTION mHandle;
-  #elif defined(_TTHREAD_POSIX_)
-    pthread_mutex_t mHandle;
-  #endif
-#endif
-};
-
-}
-
-#endif // _FAST_MUTEX_H_
-
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/time.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/time.hpp
@ -1,43 +0,0 @@
-#ifndef VTKMDIY_TIME_HPP
-#define VTKMDIY_TIME_HPP
-
-#ifndef _WIN32
-#include <sys/time.h>
-#ifdef __MACH__
-#include <mach/clock.h>
-#include <mach/mach.h>
-#endif // __MACH__
-#endif // ifndef _WIN32
-
-namespace diy
-{
-
-typedef     unsigned long       time_type;
-
-inline time_type get_time()
-{
-#ifdef __MACH__ // OS X does not have clock_gettime, use clock_get_time
-    clock_serv_t cclock;
-    mach_timespec_t ts;
-    host_get_clock_service(mach_host_self(), CALENDAR_CLOCK, &cclock);
-    clock_get_time(cclock, &ts);
-    mach_port_deallocate(mach_task_self(), cclock);
-    return ts.tv_sec*1000 + static_cast<unsigned int>(ts.tv_nsec/1000000);
-#elif defined(_WIN32)
-    // SOURCE: http://stackoverflow.com/questions/5404277/porting-clock-gettime-to-windows
-    __int64 wintime;
-    GetSystemTimeAsFileTime((FILETIME*)&wintime);
-    wintime      -=116444736000000000LL;  //1jan1601 to 1jan1970
-    long tv_sec  = static_cast<long>(wintime / 10000000LL);           //seconds
-    long tv_nsec = static_cast<long>(wintime % 10000000LL *100);      //nano-seconds
-    return static_cast<time_type>(tv_sec*1000 + tv_nsec/1000000);
-#else
-    timespec ts;
-    clock_gettime(CLOCK_REALTIME, &ts);
-    return ts.tv_sec*1000 + ts.tv_nsec/1000000;
-#endif
-}
-
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/types.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/types.hpp
@ -1,94 +0,0 @@
-#ifndef VTKMDIY_TYPES_HPP
-#define VTKMDIY_TYPES_HPP
-
-#include <iostream>
-#include "constants.h"
-#include "point.hpp"
-
-namespace diy
-{
-    struct BlockID
-    {
-        int gid, proc;
-
-        BlockID() = default;
-        BlockID(int _gid, int _proc) : gid(_gid), proc(_proc) {}
-    };
-
-    template<class Coordinate_>
-    struct Bounds
-    {
-        using Coordinate = Coordinate_;
-        using Point      = diy::Point<Coordinate, DIY_MAX_DIM>;
-
-        Point min, max;
-
-        Bounds() = default;
-        Bounds(const Point& _min, const Point& _max) : min(_min), max(_max) {}
-    };
-    using DiscreteBounds   = Bounds<int>;
-    using ContinuousBounds = Bounds<float>;
-
-    //! Helper to create a 1-dimensional discrete domain with the specified extents
-    inline
-    diy::DiscreteBounds
-    interval(int from, int to)            { DiscreteBounds domain; domain.min[0] = from; domain.max[0] = to; return domain; }
-
-    struct Direction: public Point<int,DIY_MAX_DIM>
-    {
-              Direction()                 { for (size_t i = 0; i < DIY_MAX_DIM; ++i) (*this)[i] = 0; }
-              Direction(std::initializer_list<int> lst):
-                Direction()               { size_t i = 0; for(int x : lst) (*this)[i++] = x; }
-              Direction(int dir)
-      {
-          for (size_t i = 0; i < DIY_MAX_DIM; ++i) (*this)[i] = 0;
-          if (dir & DIY_X0) (*this)[0] -= 1;
-          if (dir & DIY_X1) (*this)[0] += 1;
-          if (dir & DIY_Y0) (*this)[1] -= 1;
-          if (dir & DIY_Y1) (*this)[1] += 1;
-          if (dir & DIY_Z0) (*this)[2] -= 1;
-          if (dir & DIY_Z1) (*this)[2] += 1;
-          if (dir & DIY_T0) (*this)[3] -= 1;
-          if (dir & DIY_T1) (*this)[3] += 1;
-      }
-
-      bool
-      operator==(const diy::Direction& y) const
-      {
-        for (size_t i = 0; i < DIY_MAX_DIM; ++i)
-            if ((*this)[i] != y[i]) return false;
-        return true;
-      }
-
-      // lexicographic comparison
-      bool
-      operator<(const diy::Direction& y) const
-      {
-        for (size_t i = 0; i < DIY_MAX_DIM; ++i)
-        {
-            if ((*this)[i] < y[i]) return true;
-            if ((*this)[i] > y[i]) return false;
-        }
-        return false;
-      }
-    };
-
-    // Selector of bounds value type
-    template<class Bounds_>
-    struct BoundsValue
-    {
-        using type = typename Bounds_::Coordinate;
-    };
-
-    inline
-    bool
-    operator<(const diy::BlockID& x, const diy::BlockID& y)
-    { return x.gid < y.gid; }
-
-    inline
-    bool
-    operator==(const diy::BlockID& x, const diy::BlockID& y)
-    { return x.gid == y.gid; }
-}
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/version.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/version.hpp
@ -1,8 +0,0 @@
-#ifndef VTKMDIY_VERSION_HPP
-#define VTKMDIY_VERSION_HPP
-
-#define VTKMDIY_VERSION_MAJOR 3
-#define VTKMDIY_VERSION_MINOR 5
-#define VTKMDIY_VERSION_PATCH dev1
-
-#endif
--- a/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/vertices.hpp
+++ b/vtkm/thirdparty/diy/vtkmdiy/include/vtkmdiy/vertices.hpp
@ -1,54 +0,0 @@
-#ifndef VTKMDIY_VERTICES_HPP
-#define VTKMDIY_VERTICES_HPP
-
-#include <iterator>
-
-namespace diy
-{
-
-namespace detail
-{
-    template<class Vertex, size_t I>
-    struct IsLast
-    {
-        static constexpr bool value = (Vertex::dimension() - 1 == I);
-    };
-
-    template<class Vertex, class Callback, size_t I, bool P>
-    struct ForEach
-    {
-        void operator()(Vertex& pos, const Vertex& from, const Vertex& to, const Callback& callback) const
-        {
-            for (pos[I] = from[I]; pos[I] <= to[I]; ++pos[I])
-                ForEach<Vertex, Callback, I+1, IsLast<Vertex,I+1>::value>()(pos, from, to, callback);
-        }
-    };
-
-    template<class Vertex, class Callback, size_t I>
-    struct ForEach<Vertex,Callback,I,true>
-    {
-        void operator()(Vertex& pos, const Vertex& from, const Vertex& to, const Callback& callback) const
-        {
-            for (pos[I] = from[I]; pos[I] <= to[I]; ++pos[I])
-                callback(pos);
-        }
-    };
-}
-
-template<class Vertex, class Callback>
-void for_each(const Vertex& from, const Vertex& to, const Callback& callback)
-{
-    Vertex pos;
-    diy::detail::ForEach<Vertex, Callback, 0, detail::IsLast<Vertex,0>::value>()(pos, from, to, callback);
-}
-
-template<class Vertex, class Callback>
-void for_each(const Vertex& shape, const Callback& callback)
-{
-    // specify grid namespace to disambiguate with std::for_each(...)
-    diy::for_each(Vertex::zero(), shape - Vertex::one(), callback);
-}
-
-}
-
-#endif