Merge topic 'free-atomic-functions'

6cbcb9f5d Fix behavior of Cuda AtomicLoad with SequentiallyConsistent 7573d4ed5 Fix compiler warnings 147dd24d0 Remove ARM intrinsics in MSVC 2229c22f4 Avoid invalid Kokkos atomic calls 3b147878f Always use our implementation of Cuda atomics 9e6fe8fb6 Add memory order semantics to atomic functions d2ac4b860 Be more careful in casting with Atomic functions 13056b3af Deprecate AtomicInterfaceControl and AtomicInterfaceExecution ... Acked-by: Kitware Robot <kwrobot@kitware.com> Merge-request: !2223
2024-10-05 01:49:02 +00:00 · 2020-08-22 04:20:46 +00:00 · 2020-08-22 04:20:46 +00:00 · 0370029f92
commit 0370029f92
parent 42009394fc 6cbcb9f5d7
29 changed files with 1345 additions and 713 deletions
--- a/docs/changelog/free-atomic-functions.md
+++ b/docs/changelog/free-atomic-functions.md
@ -0,0 +1,14 @@
+# Add atomic free functions
+
+Previously, all atomic functions were stored in classes named
+`AtomicInterfaceControl` and `AtomicInterfaceExecution`, which required
+you to know at compile time which device was using the methods. That in
+turn means that anything using an atomic needed to be templated on the
+device it is running on.
+
+That can be a big hassle (and is problematic for some code structure).
+Instead, these methods are moved to free functions in the `vtkm`
+namespace. These functions operate like those in `Math.h`. Using
+compiler directives, an appropriate version of the function is compiled
+for the current device the compiler is using.
+
--- a/vtkm/Atomic.h
+++ b/vtkm/Atomic.h
@ -0,0 +1,821 @@
+//============================================================================
+//  Copyright (c) Kitware, Inc.
+//  All rights reserved.
+//  See LICENSE.txt for details.
+//
+//  This software is distributed WITHOUT ANY WARRANTY; without even
+//  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+//  PURPOSE.  See the above copyright notice for more information.
+//============================================================================
+#ifndef vtk_m_Atomic_h
+#define vtk_m_Atomic_h
+
+#include <vtkm/List.h>
+
+#include <vtkm/internal/Windows.h>
+
+#include <atomic>
+
+namespace vtkm
+{
+
+/// \brief Specifies memory order semantics for atomic operations.
+///
+/// The memory order parameter controls how all other memory operations are
+/// ordered around a specific atomic instruction.
+///
+/// Memory access is complicated. Compilers can reorder instructions to optimize
+/// scheduling, processors can speculatively read memory, and caches make
+/// assumptions about coherency that we may not normally be aware of. Because of
+/// this complexity, the order in which multiple updates to shared memory become
+/// visible to other threads is not guaranteed, nor is it guaranteed that each
+/// thread will see memory updates occur in the same order as any other thread.
+/// This can lead to surprising behavior and cause problems when using atomics
+/// to communicate between threads.
+///
+/// These problems are solved by using a standard set of memory orderings which
+/// describe common access patterns used for shared memory programming. Their
+/// goal is to provide guarantees that changes made in one thread will be visible
+/// to another thread at a specific and predictable point in execution, regardless
+/// of any hardware or compiler optimizations.
+///
+/// If unsure, use `SequentiallyConsistent` memory orderings. It will "do the right
+/// thing", but at the cost of increased and possibly unnecessary memory ordering
+/// restrictions. The other orderings are optimizations that are only applicable
+/// in very specific situations.
+///
+/// See https://en.cppreference.com/w/cpp/atomic/memory_order for a detailed
+/// description of the different orderings and their usage.
+///
+/// The memory order semantics follow those of other common atomic operations such as
+/// the `std::memory_order` identifiers used for `std::atomic`.
+///
+/// Note that when a memory order is specified, the enforced memory order is guaranteed
+/// to be as good or better than that requested.
+///
+enum class MemoryOrder
+{
+  /// An atomic operations with `Relaxed` memory order enforces no synchronization or ordering
+  /// constraints on local reads and writes. That is, a read or write to a local, non-atomic
+  /// variable may be moved to before or after an atomic operation with `Relaxed` memory order.
+  ///
+  Relaxed,
+
+  /// A load operation with `Acquire` memory order will enforce that any local read or write
+  /// operations listed in the program after the atomic will happen after the atomic.
+  ///
+  Acquire,
+
+  /// A store operation with `Release` memory order will enforce that any local read or write
+  /// operations listed in the program before the atomic will happen before the atomic.
+  ///
+  Release,
+
+  /// A read-modify-write operation with `AcquireAndRelease` memory order will enforce that any
+  /// local read or write operations listed in the program before the atomic will happen before the
+  /// atomic and likewise any read or write operations listed in the program after the atomic will
+  /// happen after the atomic.
+  ///
+  AcquireAndRelease,
+
+  /// An atomic with `SequentiallyConsistent` memory order will enforce any appropriate semantics
+  /// as `Acquire`, `Release`, and `AcquireAndRelease`. Additionally, `SequentiallyConsistent` will
+  /// enforce a consistent ordering of atomic operations across all threads. That is, all threads
+  /// observe the modifications in the same order.
+  ///
+  SequentiallyConsistent
+};
+
+namespace internal
+{
+
+VTKM_EXEC_CONT inline std::memory_order StdAtomicMemOrder(vtkm::MemoryOrder order)
+{
+  switch (order)
+  {
+    case vtkm::MemoryOrder::Relaxed:
+      return std::memory_order_relaxed;
+    case vtkm::MemoryOrder::Acquire:
+      return std::memory_order_acquire;
+    case vtkm::MemoryOrder::Release:
+      return std::memory_order_release;
+    case vtkm::MemoryOrder::AcquireAndRelease:
+      return std::memory_order_acq_rel;
+    case vtkm::MemoryOrder::SequentiallyConsistent:
+      return std::memory_order_seq_cst;
+  }
+
+  // Should never reach here, but avoid compiler warnings
+  return std::memory_order_seq_cst;
+}
+
+} // namespace internal
+
+} // namespace vtkm
+
+
+#if defined(VTKM_CUDA_DEVICE_PASS)
+
+namespace vtkm
+{
+namespace detail
+{
+
+// Fence to ensure that previous non-atomic stores are visible to other threads.
+VTKM_EXEC_CONT inline void AtomicStoreFence(vtkm::MemoryOrder order)
+{
+  if ((order == vtkm::MemoryOrder::Release) || (order == vtkm::MemoryOrder::AcquireAndRelease) ||
+      (order == vtkm::MemoryOrder::SequentiallyConsistent))
+  {
+    __threadfence();
+  }
+}
+
+// Fence to ensure that previous non-atomic stores are visible to other threads.
+VTKM_EXEC_CONT inline void AtomicLoadFence(vtkm::MemoryOrder order)
+{
+  if ((order == vtkm::MemoryOrder::Acquire) || (order == vtkm::MemoryOrder::AcquireAndRelease) ||
+      (order == vtkm::MemoryOrder::SequentiallyConsistent))
+  {
+    __threadfence();
+  }
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicLoadImpl(const T* addr, vtkm::MemoryOrder order)
+{
+  const volatile T* vaddr = addr; /* volatile to bypass cache*/
+  if (order == vtkm::MemoryOrder::SequentiallyConsistent)
+  {
+    __threadfence();
+  }
+  const T value = *vaddr;
+  /* fence to ensure that dependent reads are correctly ordered */
+  AtomicLoadFence(order);
+  return value;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline void AtomicStoreImpl(T* addr, T value, vtkm::MemoryOrder order)
+{
+  volatile T* vaddr = addr; /* volatile to bypass cache */
+  /* fence to ensure that previous non-atomic stores are visible to other threads */
+  AtomicStoreFence(order);
+  *vaddr = value;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAddImpl(T* addr, T arg, vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  auto result = atomicAdd(addr, arg);
+  AtomicLoadFence(order);
+  return result;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAndImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  auto result = atomicAnd(addr, mask);
+  AtomicLoadFence(order);
+  return result;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicOrImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  auto result = atomicOr(addr, mask);
+  AtomicLoadFence(order);
+  return result;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicXorImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  auto result = atomicXor(addr, mask);
+  AtomicLoadFence(order);
+  return result;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicNotImpl(T* addr, vtkm::MemoryOrder order)
+{
+  return AtomicXorImpl(addr, static_cast<T>(~T{ 0u }), order);
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicCompareAndSwapImpl(T* addr,
+                                                 T desired,
+                                                 T expected,
+                                                 vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  auto result = atomicCAS(addr, expected, desired);
+  AtomicLoadFence(order);
+  return result;
+}
+}
+} // namespace vtkm::detail
+
+#elif defined(VTKM_ENABLE_KOKKOS)
+
+VTKM_THIRDPARTY_PRE_INCLUDE
+// Superhack! Kokkos_Macros.hpp defines macros to include modifiers like __device__.
+// However, we don't want to actually use those if compiling this with a standard
+// C++ compiler (because this particular code does not run on a device). Thus,
+// we want to disable that behavior when not using the device compiler. To do that,
+// we are going to have to load the KokkosCore_config.h file (which you are not
+// supposed to do), then undefine the device enables if necessary, then load
+// Kokkos_Macros.hpp to finish the state.
+#ifndef KOKKOS_MACROS_HPP
+#define KOKKOS_MACROS_HPP
+#include <KokkosCore_config.h>
+#undef KOKKOS_MACROS_HPP
+#define KOKKOS_DONT_INCLUDE_CORE_CONFIG_H
+
+#if defined(KOKKOS_ENABLE_CUDA) && !defined(VTKM_CUDA)
+#undef KOKKOS_ENABLE_CUDA
+#endif
+#endif //KOKKOS_MACROS_HPP not loaded
+
+#include <Kokkos_Core.hpp>
+VTKM_THIRDPARTY_POST_INCLUDE
+
+namespace vtkm
+{
+namespace detail
+{
+
+// Fence to ensure that previous non-atomic stores are visible to other threads.
+VTKM_EXEC_CONT inline void AtomicStoreFence(vtkm::MemoryOrder order)
+{
+  if ((order == vtkm::MemoryOrder::Release) || (order == vtkm::MemoryOrder::AcquireAndRelease) ||
+      (order == vtkm::MemoryOrder::SequentiallyConsistent))
+  {
+    Kokkos::memory_fence();
+  }
+}
+
+// Fence to ensure that previous non-atomic stores are visible to other threads.
+VTKM_EXEC_CONT inline void AtomicLoadFence(vtkm::MemoryOrder order)
+{
+  if ((order == vtkm::MemoryOrder::Acquire) || (order == vtkm::MemoryOrder::AcquireAndRelease) ||
+      (order == vtkm::MemoryOrder::SequentiallyConsistent))
+  {
+    Kokkos::memory_fence();
+  }
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicLoadImpl(const T* addr, vtkm::MemoryOrder order)
+{
+  switch (order)
+  {
+    case vtkm::MemoryOrder::Relaxed:
+      return Kokkos::Impl::atomic_load(addr, Kokkos::Impl::memory_order_relaxed);
+    case vtkm::MemoryOrder::Acquire:
+    case vtkm::MemoryOrder::Release:           // Release doesn't make sense. Use Acquire.
+    case vtkm::MemoryOrder::AcquireAndRelease: // Release doesn't make sense. Use Acquire.
+      return Kokkos::Impl::atomic_load(addr, Kokkos::Impl::memory_order_acquire);
+    case vtkm::MemoryOrder::SequentiallyConsistent:
+      return Kokkos::Impl::atomic_load(addr, Kokkos::Impl::memory_order_seq_cst);
+  }
+
+  // Should never reach here, but avoid compiler warnings
+  return Kokkos::Impl::atomic_load(addr, Kokkos::Impl::memory_order_seq_cst);
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline void AtomicStoreImpl(T* addr, T value, vtkm::MemoryOrder order)
+{
+  switch (order)
+  {
+    case vtkm::MemoryOrder::Relaxed:
+      Kokkos::Impl::atomic_store(addr, value, Kokkos::Impl::memory_order_relaxed);
+      break;
+    case vtkm::MemoryOrder::Acquire: // Acquire doesn't make sense. Use Release.
+    case vtkm::MemoryOrder::Release:
+    case vtkm::MemoryOrder::AcquireAndRelease: // Acquire doesn't make sense. Use Release.
+      Kokkos::Impl::atomic_store(addr, value, Kokkos::Impl::memory_order_release);
+      break;
+    case vtkm::MemoryOrder::SequentiallyConsistent:
+      Kokkos::Impl::atomic_store(addr, value, Kokkos::Impl::memory_order_seq_cst);
+      break;
+  }
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAddImpl(T* addr, T arg, vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  T result = Kokkos::atomic_fetch_add(addr, arg);
+  AtomicLoadFence(order);
+  return result;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAndImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  T result = Kokkos::atomic_fetch_and(addr, mask);
+  AtomicLoadFence(order);
+  return result;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicOrImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  T result = Kokkos::atomic_fetch_or(addr, mask);
+  AtomicLoadFence(order);
+  return result;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicXorImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  T result = Kokkos::atomic_fetch_xor(addr, mask);
+  AtomicLoadFence(order);
+  return result;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicNotImpl(T* addr, vtkm::MemoryOrder order)
+{
+  return AtomicXorImpl(addr, static_cast<T>(~T{ 0u }), order);
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicCompareAndSwapImpl(T* addr,
+                                                 T desired,
+                                                 T expected,
+                                                 vtkm::MemoryOrder order)
+{
+  AtomicStoreFence(order);
+  T result = Kokkos::atomic_compare_exchange(addr, expected, desired);
+  AtomicLoadFence(order);
+  return result;
+}
+}
+} // namespace vtkm::detail
+
+#elif defined(VTKM_MSVC)
+
+// Supports vtkm::UInt8, vtkm::UInt16, vtkm::UInt32, vtkm::UInt64
+
+#include <cstdint>
+#include <cstring>
+#include <intrin.h> // For MSVC atomics
+
+namespace vtkm
+{
+namespace detail
+{
+
+template <typename To, typename From>
+VTKM_EXEC_CONT inline To BitCast(const From& src)
+{
+  // The memcpy should be removed by the compiler when possible, but this
+  // works around a host of issues with bitcasting using reinterpret_cast.
+  VTKM_STATIC_ASSERT(sizeof(From) == sizeof(To));
+  To dst;
+  std::memcpy(&dst, &src, sizeof(From));
+  return dst;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T BitCast(T&& src)
+{
+  return std::forward<T>(src);
+}
+
+// Note about Load and Store implementations:
+//
+// "Simple reads and writes to properly-aligned 32-bit variables are atomic
+//  operations"
+//
+// "Simple reads and writes to properly aligned 64-bit variables are atomic on
+// 64-bit Windows. Reads and writes to 64-bit values are not guaranteed to be
+// atomic on 32-bit Windows."
+//
+// "Reads and writes to variables of other sizes [than 32 or 64 bits] are not
+// guaranteed to be atomic on any platform."
+//
+// https://docs.microsoft.com/en-us/windows/desktop/sync/interlocked-variable-access
+
+VTKM_EXEC_CONT inline vtkm::UInt8 AtomicLoadImpl(const vtkm::UInt8* addr, vtkm::MemoryOrder order)
+{
+  // This assumes that the memory interface is smart enough to load a 32-bit
+  // word atomically and a properly aligned 8-bit word from it.
+  // We could build address masks and do shifts to perform this manually if
+  // this assumption is incorrect.
+  auto result = *static_cast<volatile const vtkm::UInt8*>(addr);
+  std::atomic_thread_fence(internal::StdAtomicMemOrder(order));
+  return result;
+}
+VTKM_EXEC_CONT inline vtkm::UInt16 AtomicLoadImpl(const vtkm::UInt16* addr, vtkm::MemoryOrder order)
+{
+  // This assumes that the memory interface is smart enough to load a 32-bit
+  // word atomically and a properly aligned 16-bit word from it.
+  // We could build address masks and do shifts to perform this manually if
+  // this assumption is incorrect.
+  auto result = *static_cast<volatile const vtkm::UInt16*>(addr);
+  std::atomic_thread_fence(internal::StdAtomicMemOrder(order));
+  return result;
+}
+VTKM_EXEC_CONT inline vtkm::UInt32 AtomicLoadImpl(const vtkm::UInt32* addr, vtkm::MemoryOrder order)
+{
+  auto result = *static_cast<volatile const vtkm::UInt32*>(addr);
+  std::atomic_thread_fence(internal::StdAtomicMemOrder(order));
+  return result;
+}
+VTKM_EXEC_CONT inline vtkm::UInt64 AtomicLoadImpl(const vtkm::UInt64* addr, vtkm::MemoryOrder order)
+{
+  auto result = *static_cast<volatile const vtkm::UInt64*>(addr);
+  std::atomic_thread_fence(internal::StdAtomicMemOrder(order));
+  return result;
+}
+
+VTKM_EXEC_CONT inline void AtomicStoreImpl(vtkm::UInt8* addr,
+                                           vtkm::UInt8 val,
+                                           vtkm::MemoryOrder order)
+{
+  // There doesn't seem to be an atomic store instruction in the windows
+  // API, so just exchange and discard the result.
+  _InterlockedExchange8(reinterpret_cast<volatile CHAR*>(addr), BitCast<CHAR>(val));
+}
+VTKM_EXEC_CONT inline void AtomicStoreImpl(vtkm::UInt16* addr,
+                                           vtkm::UInt16 val,
+                                           vtkm::MemoryOrder order)
+{
+  // There doesn't seem to be an atomic store instruction in the windows
+  // API, so just exchange and discard the result.
+  _InterlockedExchange16(reinterpret_cast<volatile SHORT*>(addr), BitCast<SHORT>(val));
+}
+VTKM_EXEC_CONT inline void AtomicStoreImpl(vtkm::UInt32* addr,
+                                           vtkm::UInt32 val,
+                                           vtkm::MemoryOrder order)
+{
+  std::atomic_thread_fence(internal::StdAtomicMemOrder(order));
+  *addr = val;
+}
+VTKM_EXEC_CONT inline void AtomicStoreImpl(vtkm::UInt64* addr,
+                                           vtkm::UInt64 val,
+                                           vtkm::MemoryOrder order)
+{
+  std::atomic_thread_fence(internal::StdAtomicMemOrder(order));
+  *addr = val;
+}
+
+#define VTKM_ATOMIC_OP(vtkmName, winName, vtkmType, winType, suffix)                               \
+  VTKM_EXEC_CONT inline vtkmType vtkmName(vtkmType* addr, vtkmType arg, vtkm::MemoryOrder order)   \
+  {                                                                                                \
+    return BitCast<vtkmType>(                                                                      \
+      winName##suffix(reinterpret_cast<volatile winType*>(addr), BitCast<winType>(arg)));          \
+  }
+
+#define VTKM_ATOMIC_OPS_FOR_TYPE(vtkmType, winType, suffix)                                        \
+  VTKM_ATOMIC_OP(AtomicAddImpl, _InterlockedExchangeAdd, vtkmType, winType, suffix)                \
+  VTKM_ATOMIC_OP(AtomicAndImpl, _InterlockedAnd, vtkmType, winType, suffix)                        \
+  VTKM_ATOMIC_OP(AtomicOrImpl, _InterlockedOr, vtkmType, winType, suffix)                          \
+  VTKM_ATOMIC_OP(AtomicXorImpl, _InterlockedXor, vtkmType, winType, suffix)                        \
+  VTKM_EXEC_CONT inline vtkmType AtomicNotImpl(vtkmType* addr, vtkm::MemoryOrder order)            \
+  {                                                                                                \
+    return AtomicXorImpl(addr, static_cast<vtkmType>(~vtkmType{ 0u }), order);                     \
+  }                                                                                                \
+  VTKM_EXEC_CONT inline vtkmType AtomicCompareAndSwapImpl(                                         \
+    vtkmType* addr, vtkmType desired, vtkmType expected, vtkm::MemoryOrder order)                  \
+  {                                                                                                \
+    return BitCast<vtkmType>(                                                                      \
+      _InterlockedCompareExchange##suffix(reinterpret_cast<volatile winType*>(addr),               \
+                                          BitCast<winType>(desired),                               \
+                                          BitCast<winType>(expected)));                            \
+  }
+
+VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt8, CHAR, 8)
+VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt16, SHORT, 16)
+VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt32, LONG, )
+VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt64, LONG64, 64)
+
+#undef VTKM_ATOMIC_OPS_FOR_TYPE
+}
+} // namespace vtkm::detail
+
+#else // gcc/clang for CPU
+
+// Supports vtkm::UInt8, vtkm::UInt16, vtkm::UInt32, vtkm::UInt64
+
+#include <cstdint>
+#include <cstring>
+
+namespace vtkm
+{
+namespace detail
+{
+
+VTKM_EXEC_CONT inline int GccAtomicMemOrder(vtkm::MemoryOrder order)
+{
+  switch (order)
+  {
+    case vtkm::MemoryOrder::Relaxed:
+      return __ATOMIC_RELAXED;
+    case vtkm::MemoryOrder::Acquire:
+      return __ATOMIC_ACQUIRE;
+    case vtkm::MemoryOrder::Release:
+      return __ATOMIC_RELEASE;
+    case vtkm::MemoryOrder::AcquireAndRelease:
+      return __ATOMIC_ACQ_REL;
+    case vtkm::MemoryOrder::SequentiallyConsistent:
+      return __ATOMIC_SEQ_CST;
+  }
+
+  // Should never reach here, but avoid compiler warnings
+  return __ATOMIC_SEQ_CST;
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicLoadImpl(const T* addr, vtkm::MemoryOrder order)
+{
+  return __atomic_load_n(addr, GccAtomicMemOrder(order));
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline void AtomicStoreImpl(T* addr, T value, vtkm::MemoryOrder order)
+{
+  return __atomic_store_n(addr, value, GccAtomicMemOrder(order));
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAddImpl(T* addr, T arg, vtkm::MemoryOrder order)
+{
+  return __atomic_fetch_add(addr, arg, GccAtomicMemOrder(order));
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAndImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  return __atomic_fetch_and(addr, mask, GccAtomicMemOrder(order));
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicOrImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  return __atomic_fetch_or(addr, mask, GccAtomicMemOrder(order));
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicXorImpl(T* addr, T mask, vtkm::MemoryOrder order)
+{
+  return __atomic_fetch_xor(addr, mask, GccAtomicMemOrder(order));
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicNotImpl(T* addr, vtkm::MemoryOrder order)
+{
+  return AtomicXorImpl(addr, static_cast<T>(~T{ 0u }), order);
+}
+
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicCompareAndSwapImpl(T* addr,
+                                                 T desired,
+                                                 T expected,
+                                                 vtkm::MemoryOrder order)
+{
+  __atomic_compare_exchange_n(
+    addr, &expected, desired, false, GccAtomicMemOrder(order), GccAtomicMemOrder(order));
+  return expected;
+}
+}
+} // namespace vtkm::detail
+
+#endif // gcc/clang
+
+namespace vtkm
+{
+
+namespace detail
+{
+
+template <typename T>
+using OppositeSign = typename std::conditional<std::is_signed<T>::value,
+                                               typename std::make_unsigned<T>::type,
+                                               typename std::make_signed<T>::type>::type;
+
+} // namespace detail
+
+/// \brief The preferred type to use for atomic operations.
+///
+using AtomicTypePreferred = vtkm::UInt32;
+
+/// \brief A list of types that can be used with atomic operations.
+///
+/// TODO: Adjust based on devices being compiled.
+///
+/// BUG: vtkm::UInt64 is provided in this list even though it is not supported on CUDA
+/// before compute capability 3.5.
+///
+using AtomicTypesSupported = vtkm::List<vtkm::UInt32, vtkm::UInt64>;
+
+/// \brief Atomic function to load a value from a shared memory location.
+///
+/// Given a pointer, returns the value in that pointer. If other threads are writing to
+/// that same location, the returned value will be consistent to what was present before
+/// or after that write.
+///
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicLoad(const T* pointer,
+                                   vtkm::MemoryOrder order = vtkm::MemoryOrder::Acquire)
+{
+  return detail::AtomicLoadImpl(pointer, order);
+}
+
+///@{
+/// \brief Atomic function to save a value to a shared memory location.
+///
+/// Given a pointer and a value, stores that value at the pointer's location. If two
+/// threads are simultaneously using `AtomicStore` at the same location, the resulting
+/// value will be one of the values or the other (as opposed to a mix of bits).
+///
+template <typename T>
+VTKM_EXEC_CONT inline void AtomicStore(T* pointer,
+                                       T value,
+                                       vtkm::MemoryOrder order = vtkm::MemoryOrder::Release)
+{
+  detail::AtomicStoreImpl(pointer, value, order);
+}
+template <typename T>
+VTKM_EXEC_CONT inline void AtomicStore(T* pointer,
+                                       detail::OppositeSign<T> value,
+                                       vtkm::MemoryOrder order = vtkm::MemoryOrder::Release)
+{
+  detail::AtomicStoreImpl(pointer, static_cast<T>(value), order);
+}
+///@}
+
+///@{
+/// \brief Atomic function to add a value to a shared memory location.
+///
+/// Given a pointer and an operand, adds the operand to the value at the given memory
+/// location. The result of the addition is put into that memory location and the
+/// _old_ value that was originally in the memory is returned. For example, if you
+/// call `AtomicAdd` on a memory location that holds a 5 with an operand of 3, the
+/// value of 8 is stored in the memory location and the value of 5 is returned.
+///
+/// If multiple threads call `AtomicAdd` simultaneously, they will not interfere with
+/// each other. The result will be consistent as if one was called before the other
+/// (although it is indeterminate which will be applied first).
+///
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAdd(
+  T* pointer,
+  T operand,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicAddImpl(pointer, operand, order);
+}
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAdd(
+  T* pointer,
+  detail::OppositeSign<T> operand,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicAddImpl(pointer, static_cast<T>(operand), order);
+}
+///@}
+
+///@{
+/// \brief Atomic function to AND bits to a shared memory location.
+///
+/// Given a pointer and an operand, performs a bitwise AND of the operand and thevalue at the given
+/// memory location. The result of the AND is put into that memory location and the _old_ value
+/// that was originally in the memory is returned. For example, if you call `AtomicAnd` on a memory
+/// location that holds a 0x6 with an operand of 0x3, the value of 0x2 is stored in the memory
+/// location and the value of 0x6 is returned.
+///
+/// If multiple threads call `AtomicAnd` simultaneously, they will not interfere with
+/// each other. The result will be consistent as if one was called before the other
+/// (although it is indeterminate which will be applied first).
+///
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAnd(
+  T* pointer,
+  T operand,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicAndImpl(pointer, operand, order);
+}
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicAnd(
+  T* pointer,
+  detail::OppositeSign<T> operand,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicAndImpl(pointer, static_cast<T>(operand), order);
+}
+///@}
+
+///@{
+/// \brief Atomic function to OR bits to a shared memory location.
+///
+/// Given a pointer and an operand, performs a bitwise OR of the operand and the value at the given
+/// memory location. The result of the OR is put into that memory location and the _old_ value
+/// that was originally in the memory is returned. For example, if you call `AtomicOr` on a memory
+/// location that holds a 0x6 with an operand of 0x3, the value of 0x7 is stored in the memory
+/// location and the value of 0x6 is returned.
+///
+/// If multiple threads call `AtomicOr` simultaneously, they will not interfere with
+/// each other. The result will be consistent as if one was called before the other
+/// (although it is indeterminate which will be applied first).
+///
+template <typename T>
+VTKM_EXEC_CONT inline T
+AtomicOr(T* pointer, T operand, vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicOrImpl(pointer, operand, order);
+}
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicOr(
+  T* pointer,
+  detail::OppositeSign<T> operand,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicOrImpl(pointer, static_cast<T>(operand), order);
+}
+///@}
+
+///@{
+/// \brief Atomic function to XOR bits to a shared memory location.
+///
+/// Given a pointer and an operand, performs a bitwise exclusive-OR of the operand and the value at
+/// the given memory location. The result of the XOR is put into that memory location and the _old_
+/// value that was originally in the memory is returned. For example, if you call `AtomicXor` on a
+/// memory location that holds a 0x6 with an operand of 0x3, the value of 0x5 is stored in the
+/// memory location and the value of 0x6 is returned.
+///
+/// If multiple threads call `AtomicXor` simultaneously, they will not interfere with
+/// each other. The result will be consistent as if one was called before the other.
+///
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicXor(
+  T* pointer,
+  T operand,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicXorImpl(pointer, operand, order);
+}
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicXor(
+  T* pointer,
+  detail::OppositeSign<T> operand,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicXorImpl(pointer, static_cast<T>(operand), order);
+}
+///@}
+
+/// \brief Atomic function to NOT bits to a shared memory location.
+///
+/// Given a pointer, performs a bitwise NOT of the value at the given
+/// memory location. The result of the NOT is put into that memory location and the _old_ value
+/// that was originally in the memory is returned.
+///
+/// If multiple threads call `AtomicNot` simultaneously, they will not interfere with
+/// each other. The result will be consistent as if one was called before the other.
+///
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicNot(
+  T* pointer,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicNotImpl(pointer, order);
+}
+
+/// \brief Atomic function that replaces a value given a condition.
+///
+/// Given a pointer, a new desired value, and an expected value, replaces the value at the
+/// pointer if it is the same as the expected value with the new desired value. If the original
+/// value in the pointer does not equal the expected value, then the memory at the pointer
+/// remains unchanged. In either case, the function returns the _old_ original value that
+/// was at the pointer.
+///
+/// If multiple threads call `AtomicCompareAndSwap` simultaneously, the result will be consistent
+/// as if one was called before the other (although it is indeterminate which will be applied
+/// first).
+///
+template <typename T>
+VTKM_EXEC_CONT inline T AtomicCompareAndSwap(
+  T* pointer,
+  T desired,
+  T expected,
+  vtkm::MemoryOrder order = vtkm::MemoryOrder::SequentiallyConsistent)
+{
+  return detail::AtomicCompareAndSwapImpl(pointer, desired, expected, order);
+}
+
+} // namespace vtkm
+
+#endif //vtk_m_Atomic_h
--- a/vtkm/CMakeLists.txt
+++ b/vtkm/CMakeLists.txt
@ -19,6 +19,7 @@ vtkm_install_headers(
 set(headers
  Algorithms.h
  Assert.h
+  Atomic.h
  BinaryPredicates.h
  BinaryOperators.h
  Bitset.h
--- a/vtkm/cont/ArrayHandleBitField.h
+++ b/vtkm/cont/ArrayHandleBitField.h
@ -65,9 +65,8 @@ struct VTKM_ALWAYS_EXPORT StorageTagBitField
 template <>
 class Storage<bool, StorageTagBitField>
 {
-  using BitPortalType = vtkm::cont::detail::BitPortal<vtkm::cont::internal::AtomicInterfaceControl>;
-  using BitPortalConstType =
-    vtkm::cont::detail::BitPortalConst<vtkm::cont::internal::AtomicInterfaceControl>;
+  using BitPortalType = vtkm::cont::detail::BitPortal;
+  using BitPortalConstType = vtkm::cont::detail::BitPortalConst;

 public:
  using ValueType = bool;
@ -107,10 +106,9 @@ private:
 template <typename Device>
 class ArrayTransfer<bool, StorageTagBitField, Device>
 {
-  using AtomicInterface = AtomicInterfaceExecution<Device>;
  using StorageType = Storage<bool, StorageTagBitField>;
-  using BitPortalExecution = vtkm::cont::detail::BitPortal<AtomicInterface>;
-  using BitPortalConstExecution = vtkm::cont::detail::BitPortalConst<AtomicInterface>;
+  using BitPortalExecution = vtkm::cont::detail::BitPortal;
+  using BitPortalConstExecution = vtkm::cont::detail::BitPortalConst;

 public:
  using ValueType = bool;
--- a/vtkm/cont/BitField.h
+++ b/vtkm/cont/BitField.h
@ -11,12 +11,10 @@
 #ifndef vtk_m_cont_BitField_h
 #define vtk_m_cont_BitField_h

-#include <vtkm/cont/internal/AtomicInterfaceControl.h>
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>
-
 #include <vtkm/cont/ArrayHandle.h>
 #include <vtkm/cont/Logging.h>

+#include <vtkm/Atomic.h>
 #include <vtkm/Deprecated.h>
 #include <vtkm/List.h>
 #include <vtkm/Types.h>
@ -61,7 +59,7 @@ struct BitFieldTraits

  /// Require an unsigned integral type that is <= BlockSize bytes, and is
  /// is supported by the specified AtomicInterface.
-  template <typename WordType, typename AtomicInterface>
+  template <typename WordType>
  using IsValidWordTypeAtomic =
    std::integral_constant<bool,
                           /* is unsigned */
@ -71,7 +69,7 @@ struct BitFieldTraits
                             /* BlockSize is a multiple of WordType */
                             static_cast<size_t>(BlockSize) % sizeof(WordType) == 0 &&
                             /* Supported by atomic interface */
-                             vtkm::ListHas<typename AtomicInterface::WordTypes, WordType>::value>;
+                             vtkm::ListHas<vtkm::AtomicTypesSupported, WordType>::value>;
 };

 /// Identifies a bit in a BitField by Word and BitOffset. Note that these
@ -88,7 +86,7 @@ struct BitCoordinate
 /// Portal for performing bit or word operations on a BitField.
 ///
 /// This is the implementation used by BitPortal and BitPortalConst.
-template <typename AtomicInterface_, bool IsConst>
+template <bool IsConst>
 class BitPortalBase
 {
  // Checks if PortalType has a GetIteratorBegin() method that returns a
@ -105,12 +103,8 @@ class BitPortalBase
  using BufferType = MaybeConstPointer<void>; // void* or void const*, as appropriate

 public:
-  /// The atomic interface used to carry out atomic operations. See
-  /// AtomicInterfaceExecution<Device> and AtomicInterfaceControl
-  using AtomicInterface = AtomicInterface_;
-
  /// The fastest word type for performing bitwise operations through AtomicInterface.
-  using WordTypePreferred = typename AtomicInterface::WordTypePreferred;
+  using WordTypePreferred = vtkm::AtomicTypePreferred;

  /// MPL check for whether a WordType may be used for non-atomic operations.
  template <typename WordType>
@ -118,7 +112,7 @@ public:

  /// MPL check for whether a WordType may be used for atomic operations.
  template <typename WordType>
-  using IsValidWordTypeAtomic = BitFieldTraits::IsValidWordTypeAtomic<WordType, AtomicInterface>;
+  using IsValidWordTypeAtomic = BitFieldTraits::IsValidWordTypeAtomic<WordType>;

  VTKM_STATIC_ASSERT_MSG(IsValidWordType<WordTypeDefault>::value,
                         "Internal error: Default word type is invalid.");
@ -281,7 +275,7 @@ public:
    VTKM_STATIC_ASSERT_MSG(IsValidWordTypeAtomic<WordType>::value,
                           "Requested WordType does not support atomic"
                           " operations on target execution platform.");
-    AtomicInterface::Store(this->GetWordAddress<WordType>(wordIdx), word);
+    vtkm::AtomicStore(this->GetWordAddress<WordType>(wordIdx), word);
  }

  /// Get the word (of type @a WordType) at @a wordIdx using non-atomic
@ -300,7 +294,7 @@ public:
    VTKM_STATIC_ASSERT_MSG(IsValidWordTypeAtomic<WordType>::value,
                           "Requested WordType does not support atomic"
                           " operations on target execution platform.");
-    return AtomicInterface::Load(this->GetWordAddress<WordType>(wordIdx));
+    return vtkm::AtomicLoad(this->GetWordAddress<WordType>(wordIdx));
  }

  /// Toggle the bit at @a bitIdx, returning the original value. This method
@ -326,7 +320,7 @@ public:
                           "Requested WordType does not support atomic"
                           " operations on target execution platform.");
    WordType* addr = this->GetWordAddress<WordType>(wordIdx);
-    return AtomicInterface::Not(addr);
+    return vtkm::AtomicNot(addr);
  }

  /// Perform an "and" operation between the bit at @a bitIdx and @a val,
@ -356,7 +350,7 @@ public:
                           "Requested WordType does not support atomic"
                           " operations on target execution platform.");
    WordType* addr = this->GetWordAddress<WordType>(wordIdx);
-    return AtomicInterface::And(addr, wordmask);
+    return vtkm::AtomicAnd(addr, wordmask);
  }

  /// Perform an "of" operation between the bit at @a bitIdx and @a val,
@ -386,7 +380,7 @@ public:
                           "Requested WordType does not support atomic"
                           " operations on target execution platform.");
    WordType* addr = this->GetWordAddress<WordType>(wordIdx);
-    return AtomicInterface::Or(addr, wordmask);
+    return vtkm::AtomicOr(addr, wordmask);
  }

  /// Perform an "xor" operation between the bit at @a bitIdx and @a val,
@ -416,7 +410,7 @@ public:
                           "Requested WordType does not support atomic"
                           " operations on target execution platform.");
    WordType* addr = this->GetWordAddress<WordType>(wordIdx);
-    return AtomicInterface::Xor(addr, wordmask);
+    return vtkm::AtomicXor(addr, wordmask);
  }

  /// Perform an atomic compare-and-swap operation on the bit at @a bitIdx.
@ -469,7 +463,7 @@ public:
                           "Requested WordType does not support atomic"
                           " operations on target execution platform.");
    WordType* addr = this->GetWordAddress<WordType>(wordIdx);
-    return AtomicInterface::CompareAndSwap(addr, newWord, expected);
+    return vtkm::AtomicCompareAndSwap(addr, newWord, expected);
  }

 private:
@ -484,11 +478,24 @@ private:
  vtkm::Id NumberOfBits{ 0 };
 };

-template <typename AtomicOps>
-using BitPortal = BitPortalBase<AtomicOps, false>;
+using BitPortal = BitPortalBase<false>;

-template <typename AtomicOps>
-using BitPortalConst = BitPortalBase<AtomicOps, true>;
+using BitPortalConst = BitPortalBase<true>;
+
+template <typename WordType, typename Device>
+struct IsValidWordTypeDeprecated
+{
+  using type VTKM_DEPRECATED(
+    1.6,
+    "BitField::IsValidWordTypeAtomic no longer takes a second Device parameter.") =
+    detail::BitFieldTraits::IsValidWordTypeAtomic<WordType>;
+};
+
+template <typename WordType>
+struct IsValidWordTypeDeprecated<WordType, void>
+{
+  using type = detail::BitFieldTraits::IsValidWordTypeAtomic<WordType>;
+};

 } // end namespace detail

@ -501,48 +508,42 @@ public:
  using ArrayHandleType = ArrayHandle<WordTypeDefault, StorageTagBasic>;

  /// The BitPortal used in the control environment.
-  using WritePortalType = detail::BitPortal<vtkm::cont::internal::AtomicInterfaceControl>;
+  using WritePortalType = detail::BitPortal;

  /// A read-only BitPortal used in the control environment.
-  using ReadPortalType = detail::BitPortalConst<vtkm::cont::internal::AtomicInterfaceControl>;
+  using ReadPortalType = detail::BitPortalConst;

-  using PortalControl VTKM_DEPRECATED(1.6, "Use BitField::WritePortalType instead.") =
-    detail::BitPortal<vtkm::cont::internal::AtomicInterfaceControl>;
+  using PortalControl VTKM_DEPRECATED(1.6,
+                                      "Use BitField::WritePortalType instead.") = detail::BitPortal;
  using PortalConstControl VTKM_DEPRECATED(1.6, "Use ArrayBitField::ReadPortalType instead.") =
-    detail::BitPortalConst<vtkm::cont::internal::AtomicInterfaceControl>;
+    detail::BitPortalConst;

  template <typename Device>
  struct ExecutionTypes
  {
-    /// The AtomicInterfaceExecution implementation used by the specified device.
-    using AtomicInterface = vtkm::cont::internal::AtomicInterfaceExecution<Device>;
-
    /// The preferred word type used by the specified device.
-    using WordTypePreferred = typename AtomicInterface::WordTypePreferred;
+    using WordTypePreferred = vtkm::AtomicTypePreferred;

    /// A BitPortal that is usable on the specified device.
-    using Portal = detail::BitPortal<AtomicInterface>;
+    using Portal = detail::BitPortal;

    /// A read-only BitPortal that is usable on the specified device.
-    using PortalConst = detail::BitPortalConst<AtomicInterface>;
+    using PortalConst = detail::BitPortalConst;
  };

  /// Check whether a word type is valid for non-atomic operations.
  template <typename WordType>
  using IsValidWordType = detail::BitFieldTraits::IsValidWordType<WordType>;

-  /// Check whether a word type is valid for atomic operations on a specific
-  /// device.
-  template <typename WordType, typename Device>
-  using IsValidWordTypeAtomic = detail::BitFieldTraits::
-    IsValidWordTypeAtomic<WordType, vtkm::cont::internal::AtomicInterfaceExecution<Device>>;
+  /// Check whether a word type is valid for atomic operations.
+  template <typename WordType, typename Device = void>
+  using IsValidWordTypeAtomic = detail::BitFieldTraits::IsValidWordTypeAtomic<WordType>;

  /// Check whether a word type is valid for atomic operations from the control
  /// environment.
  template <typename WordType>
-  using IsValidWordTypeAtomicControl =
-    detail::BitFieldTraits::IsValidWordTypeAtomic<WordType,
-                                                  vtkm::cont::internal::AtomicInterfaceControl>;
+  using IsValidWordTypeAtomicControl VTKM_DEPRECATED(1.6, "Use IsValidWordTypeAtomic instead.") =
+    detail::BitFieldTraits::IsValidWordTypeAtomic<WordType>;

  VTKM_CONT BitField()
    : Internals{ std::make_shared<InternalStruct>() }
@ -652,11 +653,9 @@ public:
  VTKM_DEPRECATED(1.6,
                  "Use BitField::WritePortal() instead. "
                  "Note that the returned portal will lock the array while it is in scope.")
-  detail::BitPortal<vtkm::cont::internal::AtomicInterfaceControl> GetPortalControl()
+  detail::BitPortal GetPortalControl()
  {
-    return detail::BitPortal<vtkm::cont::internal::AtomicInterfaceControl>{
-      this->Internals->Data.WritePortal(), this->Internals->NumberOfBits
-    };
+    return detail::BitPortal{ this->Internals->Data.WritePortal(), this->Internals->NumberOfBits };
  }

  /// Get a read-only portal to the data that is usable from the control
@ -665,11 +664,10 @@ public:
  VTKM_DEPRECATED(1.6,
                  "Use BitField::ReadPortal() instead. "
                  "Note that the returned portal will lock the array while it is in scope.")
-  detail::BitPortalConst<vtkm::cont::internal::AtomicInterfaceControl> GetPortalConstControl() const
+  detail::BitPortalConst GetPortalConstControl() const
  {
-    return detail::BitPortalConst<vtkm::cont::internal::AtomicInterfaceControl>{
-      this->Internals->Data.ReadPortal(), this->Internals->NumberOfBits
-    };
+    return detail::BitPortalConst{ this->Internals->Data.ReadPortal(),
+                                   this->Internals->NumberOfBits };
  }

  /// Prepares this BitField to be used as an input to an operation in the
--- a/vtkm/cont/DeviceAdapterAlgorithm.h
+++ b/vtkm/cont/DeviceAdapterAlgorithm.h
@ -722,14 +722,6 @@ public:
 #endif
 };

-/// \brief Class providing a device-specific support for atomic operations.
-///
-/// AtomicInterfaceControl provides atomic operations for the control
-/// environment, and may be subclassed to implement the device interface when
-/// appropriate for a CPU-based device.
-template <typename DeviceTag>
-class AtomicInterfaceExecution;
-
 /// \brief Class providing a device-specific support for selecting the optimal
 /// Task type for a given worklet.
 ///
--- a/vtkm/cont/cuda/internal/AtomicInterfaceExecutionCuda.h
+++ b/vtkm/cont/cuda/internal/AtomicInterfaceExecutionCuda.h
@ -1,143 +0,0 @@
-//============================================================================
-//  Copyright (c) Kitware, Inc.
-//  All rights reserved.
-//  See LICENSE.txt for details.
-//
-//  This software is distributed WITHOUT ANY WARRANTY; without even
-//  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-//  PURPOSE.  See the above copyright notice for more information.
-//============================================================================
-#ifndef vtk_m_cont_cuda_internal_AtomicInterfaceExecutionCuda_h
-#define vtk_m_cont_cuda_internal_AtomicInterfaceExecutionCuda_h
-
-#include <vtkm/cont/cuda/internal/DeviceAdapterTagCuda.h>
-
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>
-
-#include <vtkm/List.h>
-#include <vtkm/Types.h>
-
-namespace vtkm
-{
-namespace cont
-{
-namespace internal
-{
-
-template <>
-class AtomicInterfaceExecution<DeviceAdapterTagCuda>
-{
-
-public:
-  // Note: There are 64-bit atomics available, but not on all devices. Stick
-  // with 32-bit only until we require compute capability 3.5+
-  using WordTypes = vtkm::List<vtkm::UInt32>;
-  using WordTypePreferred = vtkm::UInt32;
-
-#define VTKM_ATOMIC_OPS_FOR_TYPE(type)                                                             \
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static type Load(const type* addr)                        \
-  {                                                                                                \
-    const volatile type* vaddr = addr; /* volatile to bypass cache*/                               \
-    const type value = *vaddr;                                                                     \
-    /* fence to ensure that dependent reads are correctly ordered */                               \
-    __threadfence();                                                                               \
-    return value;                                                                                  \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static void Store(type* addr, type value)                 \
-  {                                                                                                \
-    volatile type* vaddr = addr; /* volatile to bypass cache */                                    \
-    /* fence to ensure that previous non-atomic stores are visible to other threads */             \
-    __threadfence();                                                                               \
-    *vaddr = value;                                                                                \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static type Add(type* addr, type arg)                     \
-  {                                                                                                \
-    __threadfence();                                                                               \
-    auto result = atomicAdd(addr, arg);                                                            \
-    __threadfence();                                                                               \
-    return result;                                                                                 \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static type Not(type* addr)                               \
-  {                                                                                                \
-    return AtomicInterfaceExecution::Xor(addr, static_cast<type>(~type{ 0u }));                    \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static type And(type* addr, type mask)                    \
-  {                                                                                                \
-    __threadfence();                                                                               \
-    auto result = atomicAnd(addr, mask);                                                           \
-    __threadfence();                                                                               \
-    return result;                                                                                 \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static type Or(type* addr, type mask)                     \
-  {                                                                                                \
-    __threadfence();                                                                               \
-    auto result = atomicOr(addr, mask);                                                            \
-    __threadfence();                                                                               \
-    return result;                                                                                 \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static type Xor(type* addr, type mask)                    \
-  {                                                                                                \
-    __threadfence();                                                                               \
-    auto result = atomicXor(addr, mask);                                                           \
-    __threadfence();                                                                               \
-    return result;                                                                                 \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static type CompareAndSwap(                               \
-    type* addr, type newWord, type expected)                                                       \
-  {                                                                                                \
-    __threadfence();                                                                               \
-    auto result = atomicCAS(addr, expected, newWord);                                              \
-    __threadfence();                                                                               \
-    return result;                                                                                 \
-  }
-
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt32)
-
-#undef VTKM_ATOMIC_OPS_FOR_TYPE
-
-  // We also support Load, Add & CAS for 64-bit unsigned ints in order to
-  // support AtomicArray usecases. We can't generally support UInt64 without
-  // bumping our minimum device req to compute capability 3.5 (though we could
-  // just use CAS for everything if this becomes a need). All of our supported
-  // devices do support add / CAS on UInt64, just not all the bit stuff.
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static vtkm::UInt64 Load(const vtkm::UInt64* addr)
-  {
-    const volatile vtkm::UInt64* vaddr = addr; /* volatile to bypass cache*/
-    const vtkm::UInt64 value = *vaddr;
-    /* fence to ensure that dependent reads are correctly ordered */
-    __threadfence();
-    return value;
-  }
-
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static void Store(vtkm::UInt64* addr, vtkm::UInt64 value)
-  {
-    volatile vtkm::UInt64* vaddr = addr; /* volatile to bypass cache */
-    /* fence to ensure that previous non-atomic stores are visible to other threads */
-    __threadfence();
-    *vaddr = value;
-  }
-
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static vtkm::UInt64 Add(vtkm::UInt64* addr,
-                                                                 vtkm::UInt64 arg)
-  {
-    __threadfence();
-    auto result = atomicAdd(addr, arg);
-    __threadfence();
-    return result;
-  }
-
-  VTKM_SUPPRESS_EXEC_WARNINGS __device__ static vtkm::UInt64 CompareAndSwap(vtkm::UInt64* addr,
-                                                                            vtkm::UInt64 newWord,
-                                                                            vtkm::UInt64 expected)
-  {
-    __threadfence();
-    auto result = atomicCAS(addr, expected, newWord);
-    __threadfence();
-    return result;
-  }
-};
-}
-}
-} // end namespace vtkm::cont::internal
-
-#endif // vtk_m_cont_cuda_internal_AtomicInterfaceExecutionCuda_h
--- a/vtkm/cont/cuda/internal/CMakeLists.txt
+++ b/vtkm/cont/cuda/internal/CMakeLists.txt
@ -10,7 +10,6 @@

 set(headers
  ArrayManagerExecutionCuda.h
-  AtomicInterfaceExecutionCuda.h
  CudaAllocator.h
  DeviceAdapterAlgorithmCuda.h
  DeviceAdapterMemoryManagerCuda.h
--- a/vtkm/cont/cuda/internal/DeviceAdapterAlgorithmCuda.h
+++ b/vtkm/cont/cuda/internal/DeviceAdapterAlgorithmCuda.h
@ -27,8 +27,6 @@
 #include <vtkm/cont/internal/DeviceAdapterAlgorithmGeneral.h>

 #include <vtkm/cont/cuda/ErrorCuda.h>
-#include <vtkm/cont/cuda/internal/ArrayManagerExecutionCuda.h>
-#include <vtkm/cont/cuda/internal/AtomicInterfaceExecutionCuda.h>
 #include <vtkm/cont/cuda/internal/DeviceAdapterRuntimeDetectorCuda.h>
 #include <vtkm/cont/cuda/internal/DeviceAdapterTagCuda.h>
 #include <vtkm/cont/cuda/internal/DeviceAdapterTimerImplementationCuda.h>
@ -247,8 +245,7 @@ private:

    //Using typename BitsPortal::WordTypePreferred causes dependent type errors using GCC 4.8.5
    //which is the GCC required compiler for CUDA 9.2 on summit/power9
-    using Word = typename vtkm::cont::internal::AtomicInterfaceExecution<
-      DeviceAdapterTagCuda>::WordTypePreferred;
+    using Word = vtkm::AtomicTypePreferred;

    VTKM_STATIC_ASSERT(
      VTKM_PASS_COMMAS(std::is_same<typename IndicesPortal::ValueType, vtkm::Id>::value));
@ -460,8 +457,7 @@ private:

    //Using typename BitsPortal::WordTypePreferred causes dependent type errors using GCC 4.8.5
    //which is the GCC required compiler for CUDA 9.2 on summit/power9
-    using Word = typename vtkm::cont::internal::AtomicInterfaceExecution<
-      DeviceAdapterTagCuda>::WordTypePreferred;
+    using Word = vtkm::AtomicTypePreferred;

    VTKM_CONT
    CountSetBitsFunctor(const BitsPortal& portal, GlobalPopCountType* globalPopCount)
--- a/vtkm/cont/internal/AtomicInterfaceControl.h
+++ b/vtkm/cont/internal/AtomicInterfaceControl.h
@ -10,19 +10,8 @@
 #ifndef vtk_m_cont_internal_AtomicInterfaceControl_h
 #define vtk_m_cont_internal_AtomicInterfaceControl_h

-#include <vtkm/internal/Configure.h>
-#include <vtkm/internal/Windows.h>
-
-#include <vtkm/List.h>
-#include <vtkm/Types.h>
-
-#if defined(VTKM_MSVC) && !defined(VTKM_CUDA)
-#include <intrin.h> // For MSVC atomics
-#endif
-
-#include <atomic>
-#include <cstdint>
-#include <cstring>
+#include <vtkm/Atomic.h>
+#include <vtkm/Deprecated.h>

 namespace vtkm
 {
@ -31,195 +20,58 @@ namespace cont
 namespace internal
 {

-/**
- * Implementation of AtomicInterfaceDevice that uses control-side atomics.
- */
-class AtomicInterfaceControl
+struct VTKM_DEPRECATED(1.6, "Use the functions in vtkm/Atomic.h.") AtomicInterfaceControl
 {
-public:
-  using WordTypes = vtkm::List<vtkm::UInt8, vtkm::UInt16, vtkm::UInt32, vtkm::UInt64>;
+  using WordTypes = vtkm::AtomicTypesSupported;
+  using WordTypePreferred = vtkm::AtomicTypePreferred;

-  // TODO These support UInt64, too. This should be benchmarked to see which
-  // is faster.
-  using WordTypePreferred = vtkm::UInt32;
-
-#ifdef VTKM_MSVC
-private:
-  template <typename To, typename From>
-  VTKM_EXEC_CONT static To BitCast(const From& src)
+  template <typename T>
+  VTKM_EXEC_CONT static T Load(const T* addr)
  {
-    // The memcpy should be removed by the compiler when possible, but this
-    // works around a host of issues with bitcasting using reinterpret_cast.
-    VTKM_STATIC_ASSERT(sizeof(From) == sizeof(To));
-    To dst;
-    std::memcpy(&dst, &src, sizeof(From));
-    return dst;
+    return vtkm::AtomicLoad(addr);
  }

  template <typename T>
-  VTKM_EXEC_CONT static T BitCast(T&& src)
+  VTKM_EXEC_CONT static void Store(T* addr, T value)
  {
-    return std::forward<T>(src);
+    vtkm::AtomicStore(addr, value);
  }

-public:
-  // Note about Load and Store implementations:
-  //
-  // "Simple reads and writes to properly-aligned 32-bit variables are atomic
-  //  operations"
-  //
-  // "Simple reads and writes to properly aligned 64-bit variables are atomic on
-  // 64-bit Windows. Reads and writes to 64-bit values are not guaranteed to be
-  // atomic on 32-bit Windows."
-  //
-  // "Reads and writes to variables of other sizes [than 32 or 64 bits] are not
-  // guaranteed to be atomic on any platform."
-  //
-  // https://docs.microsoft.com/en-us/windows/desktop/sync/interlocked-variable-access
-
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkm::UInt8 Load(const vtkm::UInt8* addr)
+  template <typename T>
+  VTKM_EXEC_CONT static T Add(T* addr, T arg)
  {
-    // This assumes that the memory interface is smart enough to load a 32-bit
-    // word atomically and a properly aligned 8-bit word from it.
-    // We could build address masks and do shifts to perform this manually if
-    // this assumption is incorrect.
-    auto result = *static_cast<volatile const vtkm::UInt8*>(addr);
-    std::atomic_thread_fence(std::memory_order_acquire);
-    return result;
+    return vtkm::AtomicAdd(addr, arg);
  }
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkm::UInt16 Load(const vtkm::UInt16* addr)
+
+  template <typename T>
+  VTKM_EXEC_CONT static T Not(T* addr)
  {
-    // This assumes that the memory interface is smart enough to load a 32-bit
-    // word atomically and a properly aligned 16-bit word from it.
-    // We could build address masks and do shifts to perform this manually if
-    // this assumption is incorrect.
-    auto result = *static_cast<volatile const vtkm::UInt16*>(addr);
-    std::atomic_thread_fence(std::memory_order_acquire);
-    return result;
+    return vtkm::AtomicNot(addr);
  }
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkm::UInt32 Load(const vtkm::UInt32* addr)
+
+  template <typename T>
+  VTKM_EXEC_CONT static T And(T* addr, T mask)
  {
-    auto result = *static_cast<volatile const vtkm::UInt32*>(addr);
-    std::atomic_thread_fence(std::memory_order_acquire);
-    return result;
+    return vtkm::AtomicAnd(addr, mask);
  }
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkm::UInt64 Load(const vtkm::UInt64* addr)
+
+  template <typename T>
+  VTKM_EXEC_CONT static T Or(T* addr, T mask)
  {
-    auto result = *static_cast<volatile const vtkm::UInt64*>(addr);
-    std::atomic_thread_fence(std::memory_order_acquire);
-    return result;
+    return vtkm::AtomicOr(addr, mask);
  }
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static void Store(vtkm::UInt8* addr, vtkm::UInt8 val)
+
+  template <typename T>
+  VTKM_EXEC_CONT static T Xor(T* addr, T mask)
  {
-    // There doesn't seem to be an atomic store instruction in the windows
-    // API, so just exchange and discard the result.
-    _InterlockedExchange8(reinterpret_cast<volatile CHAR*>(addr), BitCast<CHAR>(val));
+    return vtkm::AtomicXor(addr, mask);
  }
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static void Store(vtkm::UInt16* addr, vtkm::UInt16 val)
+
+  template <typename T>
+  VTKM_EXEC_CONT static T CompareAndSwap(T* addr, T newWord, T expected)
  {
-    // There doesn't seem to be an atomic store instruction in the windows
-    // API, so just exchange and discard the result.
-    _InterlockedExchange16(reinterpret_cast<volatile SHORT*>(addr), BitCast<SHORT>(val));
+    return vtkm::AtomicCompareAndSwap(addr, newWord, expected);
  }
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static void Store(vtkm::UInt32* addr, vtkm::UInt32 val)
-  {
-    std::atomic_thread_fence(std::memory_order_release);
-    *addr = val;
-  }
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static void Store(vtkm::UInt64* addr, vtkm::UInt64 val)
-  {
-    std::atomic_thread_fence(std::memory_order_release);
-    *addr = val;
-  }
-
-#define VTKM_ATOMIC_OPS_FOR_TYPE(vtkmType, winType, suffix)                                        \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkmType Add(vtkmType* addr, vtkmType arg)     \
-  {                                                                                                \
-    return BitCast<vtkmType>(_InterlockedExchangeAdd##suffix(                                      \
-      reinterpret_cast<volatile winType*>(addr), BitCast<winType>(arg)));                          \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkmType Not(vtkmType* addr)                   \
-  {                                                                                                \
-    return Xor(addr, static_cast<vtkmType>(~vtkmType{ 0u }));                                      \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkmType And(vtkmType* addr, vtkmType mask)    \
-  {                                                                                                \
-    return BitCast<vtkmType>(                                                                      \
-      _InterlockedAnd##suffix(reinterpret_cast<volatile winType*>(addr), BitCast<winType>(mask))); \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkmType Or(vtkmType* addr, vtkmType mask)     \
-  {                                                                                                \
-    return BitCast<vtkmType>(                                                                      \
-      _InterlockedOr##suffix(reinterpret_cast<volatile winType*>(addr), BitCast<winType>(mask)));  \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkmType Xor(vtkmType* addr, vtkmType mask)    \
-  {                                                                                                \
-    return BitCast<vtkmType>(                                                                      \
-      _InterlockedXor##suffix(reinterpret_cast<volatile winType*>(addr), BitCast<winType>(mask))); \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static vtkmType CompareAndSwap(                       \
-    vtkmType* addr, vtkmType newWord, vtkmType expected)                                           \
-  {                                                                                                \
-    return BitCast<vtkmType>(                                                                      \
-      _InterlockedCompareExchange##suffix(reinterpret_cast<volatile winType*>(addr),               \
-                                          BitCast<winType>(newWord),                               \
-                                          BitCast<winType>(expected)));                            \
-  }
-
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt8, CHAR, 8)
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt16, SHORT, 16)
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt32, LONG, )
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt64, LONG64, 64)
-
-#undef VTKM_ATOMIC_OPS_FOR_TYPE
-
-#else // gcc/clang
-
-#define VTKM_ATOMIC_OPS_FOR_TYPE(type)                                                             \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static type Load(const type* addr)                    \
-  {                                                                                                \
-    return __atomic_load_n(addr, __ATOMIC_ACQUIRE);                                                \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static void Store(type* addr, type value)             \
-  {                                                                                                \
-    return __atomic_store_n(addr, value, __ATOMIC_RELEASE);                                        \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static type Add(type* addr, type arg)                 \
-  {                                                                                                \
-    return __atomic_fetch_add(addr, arg, __ATOMIC_SEQ_CST);                                        \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static type Not(type* addr)                           \
-  {                                                                                                \
-    return Xor(addr, static_cast<type>(~type{ 0u }));                                              \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static type And(type* addr, type mask)                \
-  {                                                                                                \
-    return __atomic_fetch_and(addr, mask, __ATOMIC_SEQ_CST);                                       \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static type Or(type* addr, type mask)                 \
-  {                                                                                                \
-    return __atomic_fetch_or(addr, mask, __ATOMIC_SEQ_CST);                                        \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static type Xor(type* addr, type mask)                \
-  {                                                                                                \
-    return __atomic_fetch_xor(addr, mask, __ATOMIC_SEQ_CST);                                       \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC_CONT static type CompareAndSwap(                           \
-    type* addr, type newWord, type expected)                                                       \
-  {                                                                                                \
-    __atomic_compare_exchange_n(                                                                   \
-      addr, &expected, newWord, false, __ATOMIC_SEQ_CST, __ATOMIC_SEQ_CST);                        \
-    return expected;                                                                               \
-  }
-
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt8)
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt16)
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt32)
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt64)
-
-#undef VTKM_ATOMIC_OPS_FOR_TYPE
-
-#endif
 };
 }
 }
--- a/vtkm/cont/internal/AtomicInterfaceExecution.h
+++ b/vtkm/cont/internal/AtomicInterfaceExecution.h
@ -10,7 +10,8 @@
 #ifndef vtk_m_cont_internal_AtomicInterfaceExecution_h
 #define vtk_m_cont_internal_AtomicInterfaceExecution_h

-#include <vtkm/Types.h>
+#include <vtkm/Atomic.h>
+#include <vtkm/Deprecated.h>

 namespace vtkm
 {
@ -19,104 +20,60 @@ namespace cont
 namespace internal
 {

-/// Class template that provides a collection of static methods that perform
-/// atomic operations on raw addresses. It is the responsibility of the caller
-/// to ensure that the addresses are properly aligned.
-///
-/// The class defines a WordTypePreferred member that is the fastest available
-/// for bitwise operations of the given device. At minimum, the interface must
-/// support operations on WordTypePreferred and vtkm::WordTypeDefault, which may
-/// be the same. A full list of supported word types is advertised in the type
-/// list @a WordTypes.
-///
-/// In addition, each atomic interface must support Add and CompareAndSwap on
-/// UInt32 and UInt64, as these are required for the AtomicArray implementation.
-///
-/// To implement this on devices that share the control environment, subclass
-/// vtkm::cont::internal::AtomicInterfaceControl, which may also be used
-/// directly from control-side code.
 template <typename DeviceTag>
-class AtomicInterfaceExecution
-#ifdef VTKM_DOXYGEN_ONLY
+struct VTKM_DEPRECATED(1.6, "Use the functions in vtkm/Atomic.h.") AtomicInterfaceExecution
 {
-  /// The preferred word type for the target device for bitwise atomic
-  /// operations.
-  using WordTypePreferred = FastestWordTypeForDevice;
+  using WordTypes = vtkm::AtomicTypesSupported;
+  using WordTypePreferred = vtkm::AtomicTypePreferred;

-  using WordTypes = vtkm::List<vtkm::WordTypeDefault, WordTypePreferred>;
+  template <typename T>
+  VTKM_EXEC_CONT static T Load(const T* addr)
+  {
+    return vtkm::AtomicLoad(addr);
+  }

-  /// Atomically load a value from memory while enforcing, at minimum, "acquire"
-  /// memory ordering.
-  VTKM_EXEC static vtkm::WordTypeDefault Load(vtkm::WordTypeDefault* addr);
-  VTKM_EXEC static WordTypePreferred Load(WordTypePreferred* addr);
+  template <typename T>
+  VTKM_EXEC_CONT static void Store(T* addr, T value)
+  {
+    vtkm::AtomicStore(addr, value);
+  }

-  /// Atomically write a value to memory while enforcing, at minimum, "release"
-  /// memory ordering.
-  VTKM_EXEC static void Store(vtkm::WordTypeDefault* addr, vtkm::WordTypeDefault value);
-  VTKM_EXEC static void Store(WordTypePreferred* addr, WordTypePreferred value);
+  template <typename T>
+  VTKM_EXEC_CONT static T Add(T* addr, T arg)
+  {
+    return vtkm::AtomicAdd(addr, arg);
+  }

-  /// Perform an atomic integer add operation on the word at @a addr, adding
-  /// @arg. This operation performs a full memory barrier around the atomic
-  /// access.
-  ///
-  /// The value at @a addr prior to the addition is returned.
-  ///
-  /// @note Overflow behavior is not defined for this operation.
-  /// @{
-  VTKM_EXEC static vtkm::WordTypeDefault Add(vtkm::WordTypeDefault* addr,
-                                             vtkm::WordTypeDefault arg);
-  VTKM_EXEC static WordTypePreferred Add(WordTypePreferred* addr, WordTypePreferred arg);
-  /// @}
+  template <typename T>
+  VTKM_EXEC_CONT static T Not(T* addr)
+  {
+    return vtkm::AtomicNot(addr);
+  }

-  /// Perform a bitwise atomic not operation on the word at @a addr.
-  /// This operation performs a full memory barrier around the atomic access.
-  /// @{
-  VTKM_EXEC static vtkm::WordTypeDefault Not(vtkm::WordTypeDefault* addr);
-  VTKM_EXEC static WordTypePreferred Not(WordTypePreferred* addr);
-  /// @}
+  template <typename T>
+  VTKM_EXEC_CONT static T And(T* addr, T mask)
+  {
+    return vtkm::AtomicAnd(addr, mask);
+  }

-  /// Perform a bitwise atomic and operation on the word at @a addr.
-  /// This operation performs a full memory barrier around the atomic access.
-  /// @{
-  VTKM_EXEC static vtkm::WordTypeDefault And(vtkm::WordTypeDefault* addr,
-                                             vtkm::WordTypeDefault mask);
-  VTKM_EXEC static WordTypePreferred And(WordTypePreferred* addr, WordTypePreferred mask);
-  /// @}
+  template <typename T>
+  VTKM_EXEC_CONT static T Or(T* addr, T mask)
+  {
+    return vtkm::AtomicOr(addr, mask);
+  }

-  /// Perform a bitwise atomic or operation on the word at @a addr.
-  /// This operation performs a full memory barrier around the atomic access.
-  /// @{
-  VTKM_EXEC static vtkm::WordTypeDefault Or(vtkm::WordTypeDefault* addr,
-                                            vtkm::WordTypeDefault mask);
-  VTKM_EXEC static WordTypePreferred Or(WordTypePreferred* addr, WordTypePreferred mask);
-  /// @}
+  template <typename T>
+  VTKM_EXEC_CONT static T Xor(T* addr, T mask)
+  {
+    return vtkm::AtomicXor(addr, mask);
+  }

-  /// Perform a bitwise atomic xor operation on the word at @a addr.
-  /// This operation performs a full memory barrier around the atomic access.
-  /// @{
-  VTKM_EXEC static vtkm::WordTypeDefault Xor(vtkm::WordTypeDefault* addr,
-                                             vtkm::WordTypeDefault mask);
-  VTKM_EXEC static WordTypePreferred Xor(WordTypePreferred* addr, WordTypePreferred mask);
-  /// @}
-
-  /// Perform an atomic CAS operation on the word at @a addr.
-  ///
-  /// If the value at @a addr equals @a expected, @a addr will be set to
-  /// @a newWord and @a expected is returned. Otherwise, the value at @a addr
-  /// is returned and not modified.
-  ///
-  /// This operation performs a full memory barrier around the atomic access.
-  /// @{
-  VTKM_EXEC static vtkm::WordTypeDefault CompareAndSwap(vtkm::WordTypeDefault* addr,
-                                                        vtkm::WordTypeDefault newWord,
-                                                        vtkm::WordTypeDefault expected);
-  VTKM_EXEC static WordTypePreferred CompareAndSwap(WordTypePreferred* addr,
-                                                    WordTypePreferred newWord,
-                                                    WordTypePreferred expected);
-  /// @}
-}
-#endif // VTKM_DOXYGEN_ONLY
-;
+  template <typename T>
+  VTKM_EXEC_CONT static T CompareAndSwap(T* addr, T newWord, T expected)
+  {
+    return vtkm::AtomicCompareAndSwap(addr, newWord, expected);
+  }
+};
 }
 }
 } // end namespace vtkm::cont::internal
--- a/vtkm/cont/internal/FunctorsGeneral.h
+++ b/vtkm/cont/internal/FunctorsGeneral.h
@ -15,7 +15,6 @@
 #include <vtkm/TypeTraits.h>
 #include <vtkm/UnaryPredicates.h>
 #include <vtkm/cont/ArrayPortalToIterators.h>
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>

 #include <vtkm/exec/FunctorBase.h>

--- a/vtkm/cont/kokkos/DeviceAdapterKokkos.h
+++ b/vtkm/cont/kokkos/DeviceAdapterKokkos.h
@ -17,7 +17,6 @@
 #if !defined(VTKM_KOKKOS_CUDA) || defined(VTKM_CUDA)

 #include <vtkm/cont/kokkos/internal/ArrayManagerExecutionKokkos.h>
-#include <vtkm/cont/kokkos/internal/AtomicInterfaceExecutionKokkos.h>
 #include <vtkm/cont/kokkos/internal/DeviceAdapterAlgorithmKokkos.h>
 #include <vtkm/cont/kokkos/internal/DeviceAdapterMemoryManagerKokkos.h>
 #include <vtkm/cont/kokkos/internal/DeviceAdapterRuntimeDetectorKokkos.h>
--- a/vtkm/cont/kokkos/internal/AtomicInterfaceExecutionKokkos.h
+++ b/vtkm/cont/kokkos/internal/AtomicInterfaceExecutionKokkos.h
@ -1,85 +0,0 @@
-//============================================================================
-//  Copyright (c) Kitware, Inc.
-//  All rights reserved.
-//  See LICENSE.txt for details.
-//
-//  This software is distributed WITHOUT ANY WARRANTY; without even
-//  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-//  PURPOSE.  See the above copyright notice for more information.
-//============================================================================
-#ifndef vtk_m_cont_kokkos_internal_AtomicInterfaceExecutionKokkos_h
-#define vtk_m_cont_kokkos_internal_AtomicInterfaceExecutionKokkos_h
-
-#include <vtkm/cont/kokkos/internal/DeviceAdapterTagKokkos.h>
-
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>
-
-#include <vtkm/List.h>
-#include <vtkm/Types.h>
-
-VTKM_THIRDPARTY_PRE_INCLUDE
-#include <Kokkos_Core.hpp>
-VTKM_THIRDPARTY_POST_INCLUDE
-
-namespace vtkm
-{
-namespace cont
-{
-namespace internal
-{
-
-template <>
-class AtomicInterfaceExecution<DeviceAdapterTagKokkos>
-{
-
-public:
-  // Note: There are 64-bit atomics available, but not on all devices. Stick
-  // with 32-bit only until we require compute capability 3.5+
-  using WordTypes = vtkm::List<vtkm::UInt32, vtkm::UInt64>;
-  using WordTypePreferred = vtkm::UInt32;
-
-#define VTKM_ATOMIC_OPS_FOR_TYPE(type)                                                             \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC static type Load(const type* addr)                         \
-  {                                                                                                \
-    return Kokkos::Impl::atomic_load(addr);                                                        \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC static void Store(type* addr, type value)                  \
-  {                                                                                                \
-    Kokkos::Impl::atomic_store(addr, value);                                                       \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC static type Add(type* addr, type arg)                      \
-  {                                                                                                \
-    return Kokkos::atomic_fetch_add(addr, arg);                                                    \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC static type Not(type* addr)                                \
-  {                                                                                                \
-    return Kokkos::atomic_fetch_xor(addr, static_cast<type>(~type{ 0u }));                         \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC static type And(type* addr, type mask)                     \
-  {                                                                                                \
-    return Kokkos::atomic_fetch_and(addr, mask);                                                   \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC static type Or(type* addr, type mask)                      \
-  {                                                                                                \
-    return Kokkos::atomic_fetch_or(addr, mask);                                                    \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC static type Xor(type* addr, type mask)                     \
-  {                                                                                                \
-    return Kokkos::atomic_fetch_xor(addr, mask);                                                   \
-  }                                                                                                \
-  VTKM_SUPPRESS_EXEC_WARNINGS VTKM_EXEC static type CompareAndSwap(                                \
-    type* addr, type newWord, type expected)                                                       \
-  {                                                                                                \
-    return Kokkos::atomic_compare_exchange(addr, expected, newWord);                               \
-  }
-
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt32)
-  VTKM_ATOMIC_OPS_FOR_TYPE(vtkm::UInt64)
-
-#undef VTKM_ATOMIC_OPS_FOR_TYPE
-};
-}
-}
-} // end namespace vtkm::cont::internal
-
-#endif // vtk_m_cont_kokkos_internal_AtomicInterfaceExecutionKokkos_h
--- a/vtkm/cont/kokkos/internal/CMakeLists.txt
+++ b/vtkm/cont/kokkos/internal/CMakeLists.txt
@ -10,7 +10,6 @@

 set(headers
  ArrayManagerExecutionKokkos.h
-  AtomicInterfaceExecutionKokkos.h
  DeviceAdapterAlgorithmKokkos.h
  DeviceAdapterMemoryManagerKokkos.h
  DeviceAdapterRuntimeDetectorKokkos.h
--- a/vtkm/cont/openmp/DeviceAdapterOpenMP.h
+++ b/vtkm/cont/openmp/DeviceAdapterOpenMP.h
@ -16,7 +16,6 @@

 #ifdef VTKM_ENABLE_OPENMP
 #include <vtkm/cont/openmp/internal/ArrayManagerExecutionOpenMP.h>
-#include <vtkm/cont/openmp/internal/AtomicInterfaceExecutionOpenMP.h>
 #include <vtkm/cont/openmp/internal/DeviceAdapterAlgorithmOpenMP.h>
 #include <vtkm/cont/openmp/internal/DeviceAdapterMemoryManagerOpenMP.h>
 #include <vtkm/cont/openmp/internal/VirtualObjectTransferOpenMP.h>
--- a/vtkm/cont/openmp/internal/AtomicInterfaceExecutionOpenMP.h
+++ b/vtkm/cont/openmp/internal/AtomicInterfaceExecutionOpenMP.h
@ -1,35 +0,0 @@
-//============================================================================
-//  Copyright (c) Kitware, Inc.
-//  All rights reserved.
-//  See LICENSE.txt for details.
-//
-//  This software is distributed WITHOUT ANY WARRANTY; without even
-//  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-//  PURPOSE.  See the above copyright notice for more information.
-//============================================================================
-#ifndef vtk_m_cont_openmp_internal_AtomicInterfaceExecutionOpenMP_h
-#define vtk_m_cont_openmp_internal_AtomicInterfaceExecutionOpenMP_h
-
-#include <vtkm/cont/openmp/internal/DeviceAdapterTagOpenMP.h>
-
-#include <vtkm/cont/internal/AtomicInterfaceControl.h>
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>
-
-#include <vtkm/Types.h>
-
-namespace vtkm
-{
-namespace cont
-{
-namespace internal
-{
-
-template <>
-class AtomicInterfaceExecution<DeviceAdapterTagOpenMP> : public AtomicInterfaceControl
-{
-};
-}
-}
-} // end namespace vtkm::cont::internal
-
-#endif // vtk_m_cont_openmp_internal_AtomicInterfaceExecutionOpenMP_h
--- a/vtkm/cont/openmp/internal/CMakeLists.txt
+++ b/vtkm/cont/openmp/internal/CMakeLists.txt
@ -14,7 +14,6 @@ set(headers
  DeviceAdapterMemoryManagerOpenMP.h
  DeviceAdapterRuntimeDetectorOpenMP.h
  DeviceAdapterTagOpenMP.h
-  AtomicInterfaceExecutionOpenMP.h
  FunctorsOpenMP.h
  ParallelQuickSortOpenMP.h
  ParallelRadixSortOpenMP.h
--- a/vtkm/cont/serial/DeviceAdapterSerial.h
+++ b/vtkm/cont/serial/DeviceAdapterSerial.h
@ -15,7 +15,6 @@
 #include <vtkm/cont/serial/internal/DeviceAdapterTagSerial.h>
 #include <vtkm/cont/serial/internal/DeviceAdapterRuntimeDetectorSerial.h>
 #include <vtkm/cont/serial/internal/DeviceAdapterMemoryManagerSerial.h>
-#include <vtkm/cont/serial/internal/AtomicInterfaceExecutionSerial.h>
 #include <vtkm/cont/serial/internal/ArrayManagerExecutionSerial.h>
 #include <vtkm/cont/serial/internal/DeviceAdapterAlgorithmSerial.h>
 #include <vtkm/cont/serial/internal/VirtualObjectTransferSerial.h>
--- a/vtkm/cont/serial/internal/AtomicInterfaceExecutionSerial.h
+++ b/vtkm/cont/serial/internal/AtomicInterfaceExecutionSerial.h
@ -1,35 +0,0 @@
-//============================================================================
-//  Copyright (c) Kitware, Inc.
-//  All rights reserved.
-//  See LICENSE.txt for details.
-//
-//  This software is distributed WITHOUT ANY WARRANTY; without even
-//  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-//  PURPOSE.  See the above copyright notice for more information.
-//============================================================================
-#ifndef vtk_m_cont_serial_internal_AtomicInterfaceExecutionSerial_h
-#define vtk_m_cont_serial_internal_AtomicInterfaceExecutionSerial_h
-
-#include <vtkm/cont/serial/internal/DeviceAdapterTagSerial.h>
-
-#include <vtkm/cont/internal/AtomicInterfaceControl.h>
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>
-
-#include <vtkm/Types.h>
-
-namespace vtkm
-{
-namespace cont
-{
-namespace internal
-{
-
-template <>
-class AtomicInterfaceExecution<DeviceAdapterTagSerial> : public AtomicInterfaceControl
-{
-};
-}
-}
-} // end namespace vtkm::cont::internal
-
-#endif // vtk_m_cont_serial_internal_AtomicInterfaceExecutionSerial_h
--- a/vtkm/cont/serial/internal/CMakeLists.txt
+++ b/vtkm/cont/serial/internal/CMakeLists.txt
@ -10,7 +10,6 @@

 set(headers
  ArrayManagerExecutionSerial.h
-  AtomicInterfaceExecutionSerial.h
  DeviceAdapterAlgorithmSerial.h
  DeviceAdapterMemoryManagerSerial.h
  DeviceAdapterRuntimeDetectorSerial.h
--- a/vtkm/cont/tbb/DeviceAdapterTBB.h
+++ b/vtkm/cont/tbb/DeviceAdapterTBB.h
@ -15,7 +15,6 @@

 #ifdef VTKM_ENABLE_TBB
 #include <vtkm/cont/tbb/internal/ArrayManagerExecutionTBB.h>
-#include <vtkm/cont/tbb/internal/AtomicInterfaceExecutionTBB.h>
 #include <vtkm/cont/tbb/internal/DeviceAdapterAlgorithmTBB.h>
 #include <vtkm/cont/tbb/internal/DeviceAdapterMemoryManagerTBB.h>
 #include <vtkm/cont/tbb/internal/VirtualObjectTransferTBB.h>
--- a/vtkm/cont/tbb/internal/AtomicInterfaceExecutionTBB.h
+++ b/vtkm/cont/tbb/internal/AtomicInterfaceExecutionTBB.h
@ -1,35 +0,0 @@
-//============================================================================
-//  Copyright (c) Kitware, Inc.
-//  All rights reserved.
-//  See LICENSE.txt for details.
-//
-//  This software is distributed WITHOUT ANY WARRANTY; without even
-//  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
-//  PURPOSE.  See the above copyright notice for more information.
-//============================================================================
-#ifndef vtk_m_cont_tbb_internal_AtomicInterfaceExecutionTBB_h
-#define vtk_m_cont_tbb_internal_AtomicInterfaceExecutionTBB_h
-
-#include <vtkm/cont/tbb/internal/DeviceAdapterTagTBB.h>
-
-#include <vtkm/cont/internal/AtomicInterfaceControl.h>
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>
-
-#include <vtkm/Types.h>
-
-namespace vtkm
-{
-namespace cont
-{
-namespace internal
-{
-
-template <>
-class AtomicInterfaceExecution<DeviceAdapterTagTBB> : public AtomicInterfaceControl
-{
-};
-}
-}
-} // end namespace vtkm::cont::internal
-
-#endif // vtk_m_cont_tbb_internal_AtomicInterfaceExecutionTBB_h
--- a/vtkm/cont/tbb/internal/CMakeLists.txt
+++ b/vtkm/cont/tbb/internal/CMakeLists.txt
@ -10,7 +10,6 @@

 set(headers
  ArrayManagerExecutionTBB.h
-  AtomicInterfaceExecutionTBB.h
  DeviceAdapterAlgorithmTBB.h
  DeviceAdapterMemoryManagerTBB.h
  DeviceAdapterRuntimeDetectorTBB.h
--- a/vtkm/cont/testing/TestingBitField.h
+++ b/vtkm/cont/testing/TestingBitField.h
@ -100,10 +100,8 @@ template <class DeviceAdapterTag>
 struct TestingBitField
 {
  using Algo = vtkm::cont::DeviceAdapterAlgorithm<DeviceAdapterTag>;
-  using AtomicInterface = vtkm::cont::internal::AtomicInterfaceExecution<DeviceAdapterTag>;
  using Traits = vtkm::cont::detail::BitFieldTraits;
-  using WordTypes = typename AtomicInterface::WordTypes;
-  using WordTypesControl = vtkm::cont::internal::AtomicInterfaceControl::WordTypes;
+  using WordTypes = vtkm::AtomicTypesSupported;

  VTKM_EXEC_CONT
  static bool RandomBitFromIndex(vtkm::Id idx) noexcept
@ -314,7 +312,7 @@ struct TestingBitField
    }

    HelpTestWordOpsControl<Portal> test(portal);
-    vtkm::ListForEach(test, typename Portal::AtomicInterface::WordTypes{});
+    vtkm::ListForEach(test, vtkm::AtomicTypesSupported{});
  }

  VTKM_CONT
@ -423,7 +421,7 @@ struct TestingBitField


    HelpTestWordOpsExecution<Portal> test(portal);
-    vtkm::ListForEach(test, typename Portal::AtomicInterface::WordTypes{});
+    vtkm::ListForEach(test, vtkm::AtomicTypesSupported{});
  }

  VTKM_CONT
--- a/vtkm/cont/testing/UnitTestDeviceAdapterAlgorithmGeneral.cxx
+++ b/vtkm/cont/testing/UnitTestDeviceAdapterAlgorithmGeneral.cxx
@ -17,8 +17,6 @@
 // this one.
 #include <vtkm/cont/ArrayHandle.h>
 #include <vtkm/cont/RuntimeDeviceTracker.h>
-#include <vtkm/cont/internal/AtomicInterfaceControl.h>
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>
 #include <vtkm/cont/internal/DeviceAdapterAlgorithmGeneral.h>
 #include <vtkm/cont/internal/VirtualObjectTransferShareWithControl.h>
 #include <vtkm/cont/serial/DeviceAdapterSerial.h>
@ -101,11 +99,6 @@ public:
  }
 };

-template <>
-class AtomicInterfaceExecution<DeviceAdapterTagTestAlgorithmGeneral> : public AtomicInterfaceControl
-{
-};
-
 template <typename TargetClass>
 struct VirtualObjectTransfer<TargetClass, vtkm::cont::DeviceAdapterTagTestAlgorithmGeneral> final
  : public VirtualObjectTransferShareWithControl<TargetClass>
--- a/vtkm/exec/AtomicArrayExecutionObject.h
+++ b/vtkm/exec/AtomicArrayExecutionObject.h
@ -10,10 +10,10 @@
 #ifndef vtk_m_exec_AtomicArrayExecutionObject_h
 #define vtk_m_exec_AtomicArrayExecutionObject_h

+#include <vtkm/Atomic.h>
 #include <vtkm/List.h>
 #include <vtkm/cont/ArrayHandle.h>
 #include <vtkm/cont/DeviceAdapter.h>
-#include <vtkm/cont/internal/AtomicInterfaceExecution.h>

 #include <type_traits>

@ -25,7 +25,7 @@ namespace exec
 namespace detail
 {
 // Clang-7 as host compiler under nvcc returns types from std::make_unsigned
-// that are not compatible with the AtomicInterface API, so we define our own
+// that are not compatible with the vtkm::Atomic API, so we define our own
 // mapping. This must exist for every entry in vtkm::cont::AtomicArrayTypeList.
 template <typename>
 struct MakeUnsigned;
@ -54,8 +54,6 @@ struct MakeUnsigned<vtkm::Int64>
 template <typename T, typename Device>
 class AtomicArrayExecutionObject
 {
-  using AtomicInterface = vtkm::cont::internal::AtomicInterfaceExecution<Device>;
-
  // Checks if PortalType has a GetIteratorBegin() method that returns a
  // pointer.
  template <typename PortalType,
@ -103,13 +101,12 @@ public:
  VTKM_EXEC
  ValueType Get(vtkm::Id index) const
  {
-    // We only support 32/64 bit signed/unsigned ints, and AtomicInterface
+    // We only support 32/64 bit signed/unsigned ints, and vtkm::Atomic
    // currently only provides API for unsigned types.
    // We'll cast the signed types to unsigned to work around this.
    using APIType = typename detail::MakeUnsigned<ValueType>::type;

-    return static_cast<T>(
-      AtomicInterface::Load(reinterpret_cast<const APIType*>(this->Data + index)));
+    return static_cast<T>(vtkm::AtomicLoad(reinterpret_cast<const APIType*>(this->Data + index)));
  }

  /// \brief Peform an atomic addition with sequentially consistent memory
@ -123,7 +120,7 @@ public:
  VTKM_EXEC
  ValueType Add(vtkm::Id index, const ValueType& value) const
  {
-    // We only support 32/64 bit signed/unsigned ints, and AtomicInterface
+    // We only support 32/64 bit signed/unsigned ints, and vtkm::Atomic
    // currently only provides API for unsigned types.
    // We'll cast the signed types to unsigned to work around this.
    // This is safe, since the only difference between signed/unsigned types
@ -131,8 +128,8 @@ public:
    // document that overflow is undefined for this operation.
    using APIType = typename detail::MakeUnsigned<ValueType>::type;

-    return static_cast<T>(AtomicInterface::Add(reinterpret_cast<APIType*>(this->Data + index),
-                                               static_cast<APIType>(value)));
+    return static_cast<T>(
+      vtkm::AtomicAdd(reinterpret_cast<APIType*>(this->Data + index), static_cast<APIType>(value)));
  }

  /// \brief Peform an atomic store to memory while enforcing, at minimum, "release"
@ -150,7 +147,7 @@ public:
  VTKM_EXEC
  void Set(vtkm::Id index, const ValueType& value) const
  {
-    // We only support 32/64 bit signed/unsigned ints, and AtomicInterface
+    // We only support 32/64 bit signed/unsigned ints, and vtkm::Atomic
    // currently only provides API for unsigned types.
    // We'll cast the signed types to unsigned to work around this.
    // This is safe, since the only difference between signed/unsigned types
@ -158,8 +155,7 @@ public:
    // document that overflow is undefined for this operation.
    using APIType = typename detail::MakeUnsigned<ValueType>::type;

-    AtomicInterface::Store(reinterpret_cast<APIType*>(this->Data + index),
-                           static_cast<APIType>(value));
+    vtkm::AtomicStore(reinterpret_cast<APIType*>(this->Data + index), static_cast<APIType>(value));
  }

  /// \brief Perform an atomic CAS operation with sequentially consistent
@ -204,17 +200,16 @@ public:
                           const ValueType& newValue,
                           const ValueType& oldValue) const
  {
-    // We only support 32/64 bit signed/unsigned ints, and AtomicInterface
+    // We only support 32/64 bit signed/unsigned ints, and vtkm::Atomic
    // currently only provides API for unsigned types.
    // We'll cast the signed types to unsigned to work around this.
    // This is safe, since the only difference between signed/unsigned types
    // is how overflow works, and signed overflow is already undefined.
    using APIType = typename detail::MakeUnsigned<ValueType>::type;

-    return static_cast<T>(
-      AtomicInterface::CompareAndSwap(reinterpret_cast<APIType*>(this->Data + index),
-                                      static_cast<APIType>(newValue),
-                                      static_cast<APIType>(oldValue)));
+    return static_cast<T>(vtkm::AtomicCompareAndSwap(reinterpret_cast<APIType*>(this->Data + index),
+                                                     static_cast<APIType>(newValue),
+                                                     static_cast<APIType>(oldValue)));
  }

 private:
--- a/vtkm/testing/CMakeLists.txt
+++ b/vtkm/testing/CMakeLists.txt
@ -49,6 +49,7 @@ set(unit_tests
 # Unit tests that have device-specific code to be tested
 set(unit_tests_device
  UnitTestAlgorithms.cxx
+  UnitTestAtomic.cxx
  UnitTestGeometry.cxx
  UnitTestMath.cxx
  )
--- a/vtkm/testing/UnitTestAtomic.cxx
+++ b/vtkm/testing/UnitTestAtomic.cxx
@ -0,0 +1,359 @@
+//============================================================================
+//  Copyright (c) Kitware, Inc.
+//  All rights reserved.
+//  See LICENSE.txt for details.
+//
+//  This software is distributed WITHOUT ANY WARRANTY; without even
+//  the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR
+//  PURPOSE.  See the above copyright notice for more information.
+//============================================================================
+
+#include <vtkm/Atomic.h>
+
+#include <vtkm/cont/Algorithm.h>
+#include <vtkm/cont/ArrayCopy.h>
+#include <vtkm/cont/ArrayHandleBasic.h>
+#include <vtkm/cont/ArrayHandleConstant.h>
+#include <vtkm/cont/ArrayHandleIndex.h>
+#include <vtkm/cont/DeviceAdapterTag.h>
+#include <vtkm/cont/ExecutionObjectBase.h>
+#include <vtkm/cont/Invoker.h>
+
+#include <vtkm/worklet/WorkletMapField.h>
+
+#include <vtkm/cont/testing/Testing.h>
+
+namespace
+{
+
+constexpr vtkm::Id ARRAY_SIZE = 100;
+
+template <typename T>
+struct AtomicTests
+{
+  vtkm::cont::Invoker Invoke;
+
+  static constexpr vtkm::Id OVERLAP = sizeof(T) * CHAR_BIT;
+  static constexpr vtkm::Id EXTENDED_SIZE = ARRAY_SIZE * OVERLAP;
+
+  VTKM_EXEC_CONT static T TestValue(vtkm::Id index) { return ::TestValue(index, T{}); }
+
+  struct ArrayToRawPointer : vtkm::cont::ExecutionObjectBase
+  {
+    vtkm::cont::ArrayHandleBasic<T> Array;
+    VTKM_CONT ArrayToRawPointer(const vtkm::cont::ArrayHandleBasic<T>& array)
+      : Array(array)
+    {
+    }
+
+    VTKM_CONT T* PrepareForExecution(vtkm::cont::DeviceAdapterId device,
+                                     vtkm::cont::Token& token) const
+    {
+      return reinterpret_cast<T*>(this->Array.GetBuffers()[0].WritePointerDevice(device, token));
+    }
+  };
+
+  struct LoadFunctor : vtkm::worklet::WorkletMapField
+  {
+    using ControlSignature = void(FieldIn ignored, ExecObject);
+    using ExecutionSignature = void(WorkIndex, _2);
+
+    VTKM_EXEC void operator()(vtkm::Id index, T* data) const
+    {
+      if (!test_equal(vtkm::AtomicLoad(data + index), TestValue(index)))
+      {
+        this->RaiseError("Bad AtomicLoad");
+      }
+    }
+  };
+
+  VTKM_CONT void TestLoad()
+  {
+    std::cout << "AtomicLoad" << std::endl;
+    vtkm::cont::ArrayHandleBasic<T> array;
+    array.Allocate(ARRAY_SIZE);
+    SetPortal(array.WritePortal());
+
+    this->Invoke(LoadFunctor{}, array, ArrayToRawPointer(array));
+  }
+
+  struct StoreFunctor : vtkm::worklet::WorkletMapField
+  {
+    using ControlSignature = void(FieldIn ignored, ExecObject);
+    using ExecutionSignature = void(WorkIndex, _2);
+
+    VTKM_EXEC void operator()(vtkm::Id index, T* data) const
+    {
+      vtkm::AtomicStore(data + (index % ARRAY_SIZE), TestValue(index));
+    }
+  };
+
+  VTKM_CONT void TestStore()
+  {
+    std::cout << "AtomicStore" << std::endl;
+    vtkm::cont::ArrayHandleBasic<T> array;
+    array.Allocate(ARRAY_SIZE);
+
+    this->Invoke(
+      StoreFunctor{}, vtkm::cont::ArrayHandleIndex(EXTENDED_SIZE), ArrayToRawPointer(array));
+
+    auto portal = array.ReadPortal();
+    for (vtkm::Id arrayIndex = 0; arrayIndex < ARRAY_SIZE; ++arrayIndex)
+    {
+      bool foundExpected = false;
+      T foundValue = portal.Get(arrayIndex);
+      for (vtkm::Id overlapIndex = 0; overlapIndex < OVERLAP; ++overlapIndex)
+      {
+        if (test_equal(foundValue, TestValue(arrayIndex + (overlapIndex * ARRAY_SIZE))))
+        {
+          foundExpected = true;
+          break;
+        }
+      }
+      VTKM_TEST_ASSERT(
+        foundExpected, "Wrong value (", foundValue, ") stored in index ", arrayIndex);
+    }
+  }
+
+  struct AddFunctor : vtkm::worklet::WorkletMapField
+  {
+    using ControlSignature = void(FieldIn ignored, ExecObject);
+    using ExecutionSignature = void(WorkIndex, _2);
+
+    VTKM_EXEC void operator()(vtkm::Id index, T* data) const
+    {
+      vtkm::AtomicAdd(data + (index % ARRAY_SIZE), 2);
+      vtkm::AtomicAdd(data + (index % ARRAY_SIZE), -1);
+    }
+  };
+
+  VTKM_CONT void TestAdd()
+  {
+    std::cout << "AtomicAdd" << std::endl;
+    vtkm::cont::ArrayHandleBasic<T> array;
+    vtkm::cont::ArrayCopy(vtkm::cont::make_ArrayHandleConstant<T>(0, ARRAY_SIZE), array);
+    array.Allocate(ARRAY_SIZE);
+
+    this->Invoke(
+      AddFunctor{}, vtkm::cont::ArrayHandleIndex(EXTENDED_SIZE), ArrayToRawPointer(array));
+
+    auto portal = array.ReadPortal();
+    T expectedValue = T(OVERLAP);
+    for (vtkm::Id arrayIndex = 0; arrayIndex < ARRAY_SIZE; ++arrayIndex)
+    {
+      T foundValue = portal.Get(arrayIndex);
+      VTKM_TEST_ASSERT(test_equal(foundValue, expectedValue), foundValue, " != ", expectedValue);
+    }
+  }
+
+  struct AndFunctor : vtkm::worklet::WorkletMapField
+  {
+    using ControlSignature = void(FieldIn ignored, ExecObject);
+    using ExecutionSignature = void(WorkIndex, _2);
+
+    VTKM_EXEC void operator()(vtkm::Id index, T* data) const
+    {
+      vtkm::Id arrayIndex = index % ARRAY_SIZE;
+      vtkm::Id offsetIndex = index / ARRAY_SIZE;
+      vtkm::AtomicAnd(data + arrayIndex, ~(T(0x1u) << offsetIndex));
+    }
+  };
+
+  VTKM_CONT void TestAnd()
+  {
+    std::cout << "AtomicAnd" << std::endl;
+    vtkm::cont::ArrayHandleBasic<T> array;
+    vtkm::cont::ArrayCopy(vtkm::cont::make_ArrayHandleConstant<T>(T(-1), ARRAY_SIZE), array);
+    array.Allocate(ARRAY_SIZE);
+
+    this->Invoke(
+      AndFunctor{}, vtkm::cont::ArrayHandleIndex(EXTENDED_SIZE), ArrayToRawPointer(array));
+
+    auto portal = array.ReadPortal();
+    for (vtkm::Id arrayIndex = 0; arrayIndex < ARRAY_SIZE; ++arrayIndex)
+    {
+      T foundValue = portal.Get(arrayIndex);
+      VTKM_TEST_ASSERT(test_equal(foundValue, 0), foundValue, " != 0");
+    }
+  }
+
+  struct OrFunctor : vtkm::worklet::WorkletMapField
+  {
+    using ControlSignature = void(FieldIn ignored, ExecObject);
+    using ExecutionSignature = void(WorkIndex, _2);
+
+    VTKM_EXEC void operator()(vtkm::Id index, T* data) const
+    {
+      vtkm::Id arrayIndex = index % ARRAY_SIZE;
+      vtkm::Id offsetIndex = index / ARRAY_SIZE;
+      vtkm::AtomicOr(data + arrayIndex, 0x1u << offsetIndex);
+    }
+  };
+
+  VTKM_CONT void TestOr()
+  {
+    std::cout << "AtomicOr" << std::endl;
+    vtkm::cont::ArrayHandleBasic<T> array;
+    vtkm::cont::ArrayCopy(vtkm::cont::make_ArrayHandleConstant<T>(0, ARRAY_SIZE), array);
+    array.Allocate(ARRAY_SIZE);
+
+    this->Invoke(
+      AndFunctor{}, vtkm::cont::ArrayHandleIndex(EXTENDED_SIZE), ArrayToRawPointer(array));
+
+    auto portal = array.ReadPortal();
+    T expectedValue = T(-1);
+    for (vtkm::Id arrayIndex = 0; arrayIndex < ARRAY_SIZE; ++arrayIndex)
+    {
+      T foundValue = portal.Get(arrayIndex);
+      VTKM_TEST_ASSERT(test_equal(foundValue, 0), foundValue, " != ", expectedValue);
+    }
+  }
+
+  struct XorFunctor : vtkm::worklet::WorkletMapField
+  {
+    using ControlSignature = void(FieldIn ignored, ExecObject);
+    using ExecutionSignature = void(WorkIndex, _2);
+
+    VTKM_EXEC void operator()(vtkm::Id index, T* data) const
+    {
+      vtkm::Id arrayIndex = index % ARRAY_SIZE;
+      vtkm::Id offsetIndex = index / ARRAY_SIZE;
+      vtkm::AtomicXor(data + arrayIndex, 0x3u << offsetIndex);
+    }
+  };
+
+  VTKM_CONT void TestXor()
+  {
+    std::cout << "AtomicXor" << std::endl;
+    vtkm::cont::ArrayHandleBasic<T> array;
+    vtkm::cont::ArrayCopy(vtkm::cont::make_ArrayHandleConstant<T>(0, ARRAY_SIZE), array);
+    array.Allocate(ARRAY_SIZE);
+
+    this->Invoke(
+      AndFunctor{}, vtkm::cont::ArrayHandleIndex(EXTENDED_SIZE), ArrayToRawPointer(array));
+
+    auto portal = array.ReadPortal();
+    T expectedValue = T(1);
+    for (vtkm::Id arrayIndex = 0; arrayIndex < ARRAY_SIZE; ++arrayIndex)
+    {
+      T foundValue = portal.Get(arrayIndex);
+      VTKM_TEST_ASSERT(test_equal(foundValue, 0), foundValue, " != ", expectedValue);
+    }
+  }
+
+  struct NotFunctor : vtkm::worklet::WorkletMapField
+  {
+    using ControlSignature = void(FieldIn ignored, ExecObject);
+    using ExecutionSignature = void(WorkIndex, _2);
+
+    VTKM_EXEC void operator()(vtkm::Id index, T* data) const
+    {
+      vtkm::Id arrayIndex = index % ARRAY_SIZE;
+      vtkm::Id offsetIndex = index / ARRAY_SIZE;
+      if (offsetIndex < arrayIndex)
+      {
+        vtkm::AtomicNot(data + arrayIndex);
+      }
+    }
+  };
+
+  VTKM_CONT void TestNot()
+  {
+    std::cout << "AtomicNot" << std::endl;
+    vtkm::cont::ArrayHandleBasic<T> array;
+    vtkm::cont::ArrayCopy(vtkm::cont::make_ArrayHandleConstant<T>(0xA, ARRAY_SIZE), array);
+    array.Allocate(ARRAY_SIZE);
+
+    this->Invoke(
+      AndFunctor{}, vtkm::cont::ArrayHandleIndex(EXTENDED_SIZE), ArrayToRawPointer(array));
+
+    auto portal = array.ReadPortal();
+    T expectedValue = T(0xA);
+    for (vtkm::Id arrayIndex = 0; arrayIndex < ARRAY_SIZE; ++arrayIndex)
+    {
+      T foundValue = portal.Get(arrayIndex);
+      VTKM_TEST_ASSERT(test_equal(foundValue, 0), foundValue, " != ", expectedValue);
+      expectedValue = static_cast<T>(~expectedValue);
+    }
+  }
+
+  struct CompareAndSwapFunctor : vtkm::worklet::WorkletMapField
+  {
+    using ControlSignature = void(FieldIn ignored, ExecObject);
+    using ExecutionSignature = void(WorkIndex, _2);
+
+    VTKM_EXEC void operator()(vtkm::Id index, T* data) const
+    {
+      vtkm::Id arrayIndex = index % ARRAY_SIZE;
+      bool success = false;
+      for (T overlapIndex = 0; overlapIndex < static_cast<T>(OVERLAP); ++overlapIndex)
+      {
+        T oldValue = vtkm::AtomicCompareAndSwap(data + arrayIndex, overlapIndex + 1, overlapIndex);
+        if (oldValue == overlapIndex)
+        {
+          success = true;
+          break;
+        }
+      }
+
+      if (!success)
+      {
+        this->RaiseError("No compare succeeded");
+      }
+    }
+  };
+
+  VTKM_CONT void TestCompareAndSwap()
+  {
+    std::cout << "AtomicCompareAndSwap" << std::endl;
+    vtkm::cont::ArrayHandleBasic<T> array;
+    vtkm::cont::ArrayCopy(vtkm::cont::make_ArrayHandleConstant<T>(0, ARRAY_SIZE), array);
+    array.Allocate(ARRAY_SIZE);
+
+    this->Invoke(
+      AddFunctor{}, vtkm::cont::ArrayHandleIndex(EXTENDED_SIZE), ArrayToRawPointer(array));
+
+    auto portal = array.ReadPortal();
+    T expectedValue = T(OVERLAP);
+    for (vtkm::Id arrayIndex = 0; arrayIndex < ARRAY_SIZE; ++arrayIndex)
+    {
+      T foundValue = portal.Get(arrayIndex);
+      VTKM_TEST_ASSERT(test_equal(foundValue, expectedValue), foundValue, " != ", expectedValue);
+    }
+  }
+
+  VTKM_CONT void TestAll()
+  {
+    TestLoad();
+    TestStore();
+    TestAdd();
+    TestAnd();
+    TestOr();
+    TestXor();
+    TestNot();
+    TestCompareAndSwap();
+  }
+};
+
+struct TestFunctor
+{
+  template <typename T>
+  VTKM_CONT void operator()(T) const
+  {
+    AtomicTests<T>().TestAll();
+  }
+};
+
+void Run()
+{
+  VTKM_TEST_ASSERT(vtkm::ListHas<vtkm::AtomicTypesSupported, vtkm::AtomicTypePreferred>::value);
+
+  vtkm::testing::Testing::TryTypes(TestFunctor{}, vtkm::AtomicTypesSupported{});
+}
+
+} // anonymous namespace
+
+int UnitTestAtomic(int argc, char* argv[])
+{
+  return vtkm::cont::testing::Testing::Run(Run, argc, argv);
+}