[commit: ghc] master: adding further documentation and explanation to the prefetch primops (dd2bce5)

Sat Nov 2 20:58:37 UTC 2013

Repository : ssh://git@git.haskell.org/ghc

On branch  : master
Link       : http://ghc.haskell.org/trac/ghc/changeset/dd2bce5ecadf3153c43483aa37c8ff4f42cecc0c/ghc

>---------------------------------------------------------------

commit dd2bce5ecadf3153c43483aa37c8ff4f42cecc0c
Author: Carter Tazio Schonwald <carter.schonwald at gmail.com>
Date:   Mon Oct 28 15:16:18 2013 -0400

    adding further documentation and explanation to the prefetch primops
    
    Signed-off-by: Carter Tazio Schonwald <carter.schonwald at gmail.com>
    Signed-off-by: Austin Seipp <austin at well-typed.com>


>---------------------------------------------------------------

dd2bce5ecadf3153c43483aa37c8ff4f42cecc0c
 compiler/prelude/primops.txt.pp |   27 +++++++++++++++++++++++++--
 1 file changed, 25 insertions(+), 2 deletions(-)

diff --git a/compiler/prelude/primops.txt.pp b/compiler/prelude/primops.txt.pp
index 5bedc31..cf2aa25 100644
--- a/compiler/prelude/primops.txt.pp
+++ b/compiler/prelude/primops.txt.pp
@@ -2604,11 +2604,20 @@ section "Prefetch"
   This suffix number, N, is the "locality level" of the prefetch, following the
   convention in GCC and other compilers.
   Higher locality numbers correspond to the memory being loaded in more
-  levels of the cpu cache, and being retained after initial use.
+  levels of the cpu cache, and being retained after initial use. The naming
+  convention follows the naming convention of the prefetch intrinsic found
+  in the GCC and Clang C compilers.
+
+  The prefetch primops are all marked with the can_fail=True attribute, but
+  they will never fail. The motivation for enabling the can_fail attribute is 
+  so that prefetches are not hoisted/let floated out. This is because prefetch
+  is a tool for optimizing usage of system memory bandwidth, and preventing let
+  hoising makes *WHEN* the prefetch happens a bit more predictable. 
+
 
   On the LLVM backend, prefetch*N# uses the LLVM prefetch intrinsic
   with locality level N. The code generated by LLVM is target architecture
-   dependent, but should agree with the GHC NCG on x86 systems.
+  dependent, but should agree with the GHC NCG on x86 systems.
 
   On the Sparc and PPC native backends, prefetch*N is a No-Op.
 
@@ -2619,6 +2628,20 @@ section "Prefetch"
   For streaming workloads, the prefetch*0 operations are recommended.
   For workloads which do many reads or writes to a memory location in a short period of time,
   prefetch*3 operations are recommended.
+
+  For further reading about prefetch and associated systems performance optimization,
+  the instruction set and optimization manuals by Intel and other CPU vendors are 
+  excellent starting place.
+
+
+  The "Intel 64 and IA-32 Architectures Optimization Reference Manual" is 
+  especially a helpful read, even if your software is meant for other CPU 
+  architectures or vendor hardware.
+  
+  http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization-manual.html
+
+  
+
    }
 ------------------------------------------------------------------------