Fix issue with Intel compiler removing loop in MatrixTranspose

For some reason when optimization was on with the Intel compiler it was removing the loop in some instances of the templated MatrixTranspose function. I inserted an empty assembly statement that prevents the compiler from removing the loop but does not add any actual code. That seems to fix the problem.
2015-07-06 14:31:43 -06:00 · 2015-07-06 14:31:43 -06:00 · dc91446972
commit dc91446972
parent 1c2f33926b
1 changed files with 9 additions and 0 deletions
--- a/vtkm/Matrix.h
+++ b/vtkm/Matrix.h
@ -244,6 +244,15 @@ vtkm::Matrix<T,NumCols,NumRows> MatrixTranspose(
  for (vtkm::IdComponent index = 0; index < NumRows; index++)
  {
    vtkm::MatrixSetColumn(result, index, vtkm::MatrixGetRow(matrix, index));
+#ifdef VTKM_ICC
+    // For reasons I do not really understand, the Intel compiler with with
+    // optimization on is sometimes removing this for loop. It appears that the
+    // optimizer sometimes does not recognize that the MatrixSetColumn function
+    // has side effects. I cannot fathom any reason for this other than a bug in
+    // the compiler, but unfortunately I do not know a reliable way to
+    // demonstrate the problem.
+    __asm__("");
+#endif
  }
  return result;
 }