SIMD/SSE support & alignment
Nicolas Trangez
nicolas at incubaid.com
Sun Mar 10 22:52:36 CET 2013
All,
I've been toying with the SSE code generation in GHC 7.7 and Geoffrey
Mainland's work to integrate this into the 'vector' library in order to
generate SIMD code from high-level Haskell code.
While working with this, I wrote some simple code for testing purposes,
then compiled this into LLVM IR and x86_64 assembly form in order to
figure out how 'good' the resulting code would be.
First and foremost: I'm really impressed. Whilst there's most certainly
room for improvement (one of them touched in this mail, though I also
noticed unnecessary constant memory reads inside a tight loop), the
initial results look very promising, especially taking into account how
high-level the source code is. This is pretty amazing!
As an example, here's 'test.hs':
{-# OPTIONS_GHC -fllvm -O3 -optlo-O3 -optlc-O=3 -funbox-strict-fields
#-}
module Test (sum) where
import Prelude hiding (sum)
import Data.Int (Int32)
import Data.Vector.Unboxed (Vector)
import qualified Data.Vector.Unboxed as U
sum :: Vector Int32 -> Int32
sum v = U.mfold' (+) (+) 0 v
When compiling this into assembly (compiler/library version details at
the end of this message), the 'sum' function yields (among other things)
this code:
.LBB2_3: # %c1C0
# =>This Inner Loop Header:
Depth=1
prefetcht0 (%rsi)
movdqu -1536(%rsi), %xmm1
paddd %xmm1, %xmm0
addq $16, %rsi
addq $4, %rcx
cmpq %rdx, %rcx
jl .LBB2_3
The full LLVM IR and assembler output are attached to this message.
Whilst this is a nice and tight loop, I noticed the use of 'movdqu',
which is used for non-128bit aligned memory access in SSE code. For
aligned memory, 'movdqa' can be used, and this can have a major
performance impact.
Whilst I understand why this code is currently generated as-is (also in
other sample inputs), I wondered whether there are plans/approaches to
tackle this. In some cases (e.g. in 'sum') this could be by using the
scalar calculation at the beginning of the vector up until an aligned
boundary, then use aligned access and handle the tail using scalars
again, but I assume OTOH that's not trivial when multiple 'source'
vectors are used in the calculation.
This might even become more complex when using AVX code, which needs
256bit alignments.
Whilst I can't propose an out-of-the-box solution, I'd like to point at
the 'vector-simd' code [1] I wrote some months ago, which might propose
some ideas. In this package, I created an unboxed vector-like type whose
alignment is tracked at type level, and functions which consume a vector
define the minimal required alignment. As such, vectors can be allocated
at the minimal alignment they're required to be, throughout all code
using them.
As an example, if I'd use this code (OTOH):
sseFoo :: (Storable a, AlignedToAtLeast A16 o1, AlignedToAtLeast A16 o2)
=> Vector o1 a -> Vector o2 a
sseFoo = undefined
avxFoo :: (Storable a, AlignedToAtLeast A32 o1, AlignedToAtLeast A32 o2,
AlignedToAtLeast A32 o3) => Vector o1 a -> Vector o2 a -> Vector o3 a
avxFoo = undefined
the type of
combinedFoo v = avxFoo sv sv
where
sv = sseFoo v
would automagically be
combinedFoo :: (Storable a, AlignedToAtLeast A16 o1, AlignedToAtLeast
A32 o2) => Vector o1 a -> Vector o2 a
and when using this
v1 = combinedFoo (Vector.fromList [1 :: Int32, 2, 3, 4, 5, 6, 7, 8])
the allocated argument vector (result of Vector.fromList) will be
16byte-aligned as expected/required for the SSE function to work with
unaligned loads internally (assuming no unaligned slices are supported,
etc), whilst the intermediate result of 'sseFoo' ('sv') will be 32-byte
aligned as required by 'avxFoo'.
Attached: test.ll and test.s, compilation results of test.hs using
$ ghc-7.7.20130302 -keep-llvm-files
-package-db=cabal-dev/packages-7.7.20130302.conf -fforce-recomp -S
test.hs
GHC from HEAD/master compiled on my Fedora 18 system using system LLVM
(3.1), 'primitive' 8aef578fa5e7fb9fac3eac17336b722cbae2f921 from
git://github.com/mainland/primitive.git and 'vector'
e1a6c403bcca07b4c8121753daf120d30dedb1b0 from
git://github.com/mainland/vector.git
Nicolas
[1] https://github.com/NicolasT/vector-simd
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test.hs
Type: text/x-haskell
Size: 289 bytes
Desc: not available
URL: <http://www.haskell.org/pipermail/ghc-devs/attachments/20130310/27ece7d7/attachment-0001.hs>
-------------- next part --------------
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v64:64:64-v128:128:128-a0:0:64-s0:64:64-f80:128:128-n8:16:32:64"
target triple = "x86_64-linux-gnu"
declare ccc i8* @memcpy(i8*, i8*, i64)
declare ccc i8* @memmove(i8*, i8*, i64)
declare ccc i8* @memset(i8*, i64, i64)
declare ccc i64 @newSpark(i8*, i8*)
!0 = metadata !{metadata !"top"}
!1 = metadata !{metadata !"stack",metadata !0}
!2 = metadata !{metadata !"heap",metadata !0}
!3 = metadata !{metadata !"rx",metadata !2}
!4 = metadata !{metadata !"base",metadata !0}
!5 = metadata !{metadata !"other",metadata !0}
%__stginit_Test_struct = type <{}>
@__stginit_Test = global %__stginit_Test_struct<{}>
%Test_zdwa_closure_struct = type <{i64}>
@Test_zdwa_closure = global %Test_zdwa_closure_struct<{i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @Test_zdwa_info to i64)}>
%Test_sum1_closure_struct = type <{i64}>
@Test_sum1_closure = global %Test_sum1_closure_struct<{i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @Test_sum1_info to i64)}>
%Test_sum_closure_struct = type <{i64}>
@Test_sum_closure = global %Test_sum_closure_struct<{i64 ptrtoint (void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @Test_sum_info to i64)}>
%S1DM_srt_struct = type <{}>
@S1DM_srt = internal constant %S1DM_srt_struct<{}>
%s1xB_entry_struct = type <{i64, i64, i64}>
@s1xB_info_itable = internal constant %s1xB_entry_struct<{i64 8589934602, i64 8589934593, i64 9}>, section "X98A__STRIP,__me1", align 8
define internal cc 10 void @s1xB_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me2"
{
c1AJ:
%Base_Var = alloca i64*, i32 1
store i64* %Base_Arg, i64** %Base_Var
%Sp_Var = alloca i64*, i32 1
store i64* %Sp_Arg, i64** %Sp_Var
%Hp_Var = alloca i64*, i32 1
store i64* %Hp_Arg, i64** %Hp_Var
%R1_Var = alloca i64, i32 1
store i64 %R1_Arg, i64* %R1_Var
%R2_Var = alloca i64, i32 1
store i64 %R2_Arg, i64* %R2_Var
%R3_Var = alloca i64, i32 1
store i64 %R3_Arg, i64* %R3_Var
%R4_Var = alloca i64, i32 1
store i64 undef, i64* %R4_Var
%R5_Var = alloca i64, i32 1
store i64 undef, i64* %R5_Var
%R6_Var = alloca i64, i32 1
store i64 undef, i64* %R6_Var
%SpLim_Var = alloca i64, i32 1
store i64 %SpLim_Arg, i64* %SpLim_Var
%F1_Var = alloca float, i32 1
store float undef, float* %F1_Var
%D1_Var = alloca double, i32 1
store double undef, double* %D1_Var
%XMM1_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1
%F2_Var = alloca float, i32 1
store float undef, float* %F2_Var
%D2_Var = alloca double, i32 1
store double undef, double* %D2_Var
%XMM2_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1
%F3_Var = alloca float, i32 1
store float undef, float* %F3_Var
%D3_Var = alloca double, i32 1
store double undef, double* %D3_Var
%XMM3_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1
%F4_Var = alloca float, i32 1
store float undef, float* %F4_Var
%D4_Var = alloca double, i32 1
store double undef, double* %D4_Var
%XMM4_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1
%F5_Var = alloca float, i32 1
store float undef, float* %F5_Var
%D5_Var = alloca double, i32 1
store double undef, double* %D5_Var
%XMM5_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1
%F6_Var = alloca float, i32 1
store float undef, float* %F6_Var
%D6_Var = alloca double, i32 1
store double undef, double* %D6_Var
%XMM6_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1
%ls1xr = alloca i64, i32 1
%ls1xy = alloca i64, i32 1
%ls1xB = alloca i64, i32 1
%ln1EB = load i64* %R3_Var
store i64 %ln1EB, i64* %ls1xr
%ln1EC = load i64* %R2_Var
store i64 %ln1EC, i64* %ls1xy
%ln1ED = load i64* %R1_Var
store i64 %ln1ED, i64* %ls1xB
%ln1EE = load i64* %ls1xr
%ln1EF = load i64* %ls1xB
%ln1EG = add i64 %ln1EF, 14
%ln1EH = inttoptr i64 %ln1EG to i64*
%ln1EI = load i64* %ln1EH, !tbaa !5
%ln1EJ = icmp sge i64 %ln1EE, %ln1EI
br i1 %ln1EJ, label %c1AN, label %c1AM
c1AM:
%ln1EK = load i64* %ls1xr
%ln1EL = add i64 %ln1EK, 1
store i64 %ln1EL, i64* %R3_Var
%ln1EM = load i64* %ls1xy
%ln1EN = load i64* %ls1xB
%ln1EO = add i64 %ln1EN, 6
%ln1EP = inttoptr i64 %ln1EO to i64*
%ln1EQ = load i64* %ln1EP, !tbaa !5
%ln1ER = load i64* %ls1xB
%ln1ES = add i64 %ln1ER, 22
%ln1ET = inttoptr i64 %ln1ES to i64*
%ln1EU = load i64* %ln1ET, !tbaa !5
%ln1EV = load i64* %ls1xr
%ln1EW = add i64 %ln1EU, %ln1EV
%ln1EX = shl i64 %ln1EW, 2
%ln1EY = add i64 %ln1EX, 16
%ln1EZ = add i64 %ln1EQ, %ln1EY
%ln1F0 = inttoptr i64 %ln1EZ to i32*
%ln1F1 = load i32* %ln1F0, !tbaa !5
%ln1F2 = sext i32 %ln1F1 to i64
%ln1F3 = add i64 %ln1EM, %ln1F2
%ln1F4 = trunc i64 %ln1F3 to i32
%ln1F5 = sext i32 %ln1F4 to i64
store i64 %ln1F5, i64* %R2_Var
%ln1F6 = load i64* %ls1xB
store i64 %ln1F6, i64* %R1_Var
%ln1F7 = load i64** %Base_Var
%ln1F8 = load i64** %Sp_Var
%ln1F9 = load i64** %Hp_Var
%ln1Fa = load i64* %R1_Var
%ln1Fb = load i64* %R2_Var
%ln1Fc = load i64* %R3_Var
%ln1Fd = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @s1xB_info( i64* %ln1F7, i64* %ln1F8, i64* %ln1F9, i64 %ln1Fa, i64 %ln1Fb, i64 %ln1Fc, i64 undef, i64 undef, i64 undef, i64 %ln1Fd ) nounwind
ret void
c1AN:
%ln1Fe = load i64* %ls1xy
store i64 %ln1Fe, i64* %R1_Var
%ln1Ff = load i64** %Sp_Var
%ln1Fg = getelementptr inbounds i64* %ln1Ff, i32 0
%ln1Fh = bitcast i64* %ln1Fg to i64*
%ln1Fi = load i64* %ln1Fh, !tbaa !1
%ln1Fj = inttoptr i64 %ln1Fi to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*
%ln1Fk = load i64** %Base_Var
%ln1Fl = load i64** %Sp_Var
%ln1Fm = load i64** %Hp_Var
%ln1Fn = load i64* %R1_Var
%ln1Fo = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1Fj( i64* %ln1Fk, i64* %ln1Fl, i64* %ln1Fm, i64 %ln1Fn, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Fo ) nounwind
ret void
}
%Test_zdwa_entry_struct = type <{i64, i64, i64}>
@Test_zdwa_info_itable = constant %Test_zdwa_entry_struct<{i64 4294967301, i64 0, i64 15}>, section "X98A__STRIP,__me3", align 8
define cc 10 void @Test_zdwa_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me4"
{
c1Bf:
%Base_Var = alloca i64*, i32 1
store i64* %Base_Arg, i64** %Base_Var
%Sp_Var = alloca i64*, i32 1
store i64* %Sp_Arg, i64** %Sp_Var
%Hp_Var = alloca i64*, i32 1
store i64* %Hp_Arg, i64** %Hp_Var
%R1_Var = alloca i64, i32 1
store i64 %R1_Arg, i64* %R1_Var
%R2_Var = alloca i64, i32 1
store i64 %R2_Arg, i64* %R2_Var
%R3_Var = alloca i64, i32 1
store i64 undef, i64* %R3_Var
%R4_Var = alloca i64, i32 1
store i64 undef, i64* %R4_Var
%R5_Var = alloca i64, i32 1
store i64 undef, i64* %R5_Var
%R6_Var = alloca i64, i32 1
store i64 undef, i64* %R6_Var
%SpLim_Var = alloca i64, i32 1
store i64 %SpLim_Arg, i64* %SpLim_Var
%F1_Var = alloca float, i32 1
store float undef, float* %F1_Var
%D1_Var = alloca double, i32 1
store double undef, double* %D1_Var
%XMM1_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1
%F2_Var = alloca float, i32 1
store float undef, float* %F2_Var
%D2_Var = alloca double, i32 1
store double undef, double* %D2_Var
%XMM2_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1
%F3_Var = alloca float, i32 1
store float undef, float* %F3_Var
%D3_Var = alloca double, i32 1
store double undef, double* %D3_Var
%XMM3_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1
%F4_Var = alloca float, i32 1
store float undef, float* %F4_Var
%D4_Var = alloca double, i32 1
store double undef, double* %D4_Var
%XMM4_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1
%F5_Var = alloca float, i32 1
store float undef, float* %F5_Var
%D5_Var = alloca double, i32 1
store double undef, double* %D5_Var
%XMM5_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1
%F6_Var = alloca float, i32 1
store float undef, float* %F6_Var
%D6_Var = alloca double, i32 1
store double undef, double* %D6_Var
%XMM6_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1
%ls1xj = alloca i64, i32 1
%ln1FV = load i64* %R2_Var
store i64 %ln1FV, i64* %ls1xj
%ln1FW = load i64** %Sp_Var
%ln1FX = getelementptr inbounds i64* %ln1FW, i32 -4
%ln1FY = ptrtoint i64* %ln1FX to i64
%ln1FZ = load i64* %SpLim_Var
%ln1G0 = icmp ult i64 %ln1FY, %ln1FZ
br i1 %ln1G0, label %c1Cf, label %c1Ce
c1Ce:
%ln1G1 = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @c1Bg_info to i64
%ln1G2 = load i64** %Sp_Var
%ln1G3 = getelementptr inbounds i64* %ln1G2, i32 -1
store i64 %ln1G1, i64* %ln1G3, !tbaa !1
%ln1G4 = load i64* %ls1xj
store i64 %ln1G4, i64* %R1_Var
%ln1G5 = load i64** %Sp_Var
%ln1G6 = getelementptr inbounds i64* %ln1G5, i32 -1
%ln1G7 = ptrtoint i64* %ln1G6 to i64
%ln1G8 = inttoptr i64 %ln1G7 to i64*
store i64* %ln1G8, i64** %Sp_Var
%ln1G9 = load i64** %Base_Var
%ln1Ga = load i64** %Sp_Var
%ln1Gb = load i64** %Hp_Var
%ln1Gc = load i64* %R1_Var
%ln1Gd = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @stg_ap_0_fast( i64* %ln1G9, i64* %ln1Ga, i64* %ln1Gb, i64 %ln1Gc, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Gd ) nounwind
ret void
c1Cf:
%ln1Ge = load i64* %ls1xj
store i64 %ln1Ge, i64* %R2_Var
%ln1Gf = ptrtoint %Test_zdwa_closure_struct* @Test_zdwa_closure to i64
store i64 %ln1Gf, i64* %R1_Var
%ln1Gg = load i64** %Base_Var
%ln1Gh = getelementptr inbounds i64* %ln1Gg, i32 -1
%ln1Gi = bitcast i64* %ln1Gh to i64*
%ln1Gj = load i64* %ln1Gi, !tbaa !4
%ln1Gk = inttoptr i64 %ln1Gj to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*
%ln1Gl = load i64** %Base_Var
%ln1Gm = load i64** %Sp_Var
%ln1Gn = load i64** %Hp_Var
%ln1Go = load i64* %R1_Var
%ln1Gp = load i64* %R2_Var
%ln1Gq = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1Gk( i64* %ln1Gl, i64* %ln1Gm, i64* %ln1Gn, i64 %ln1Go, i64 %ln1Gp, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Gq ) nounwind
ret void
}
declare cc 10 void @stg_ap_0_fast(i64* noalias nocapture, i64* noalias nocapture, i64* noalias nocapture, i64, i64, i64, i64, i64, i64, i64) align 8
%c1Bg_entry_struct = type <{i64, i64}>
@c1Bg_info_itable = internal constant %c1Bg_entry_struct<{i64 0, i64 32}>, section "X98A__STRIP,__me5", align 8
define internal cc 10 void @c1Bg_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me6"
{
c1Bg:
%Base_Var = alloca i64*, i32 1
store i64* %Base_Arg, i64** %Base_Var
%Sp_Var = alloca i64*, i32 1
store i64* %Sp_Arg, i64** %Sp_Var
%Hp_Var = alloca i64*, i32 1
store i64* %Hp_Arg, i64** %Hp_Var
%R1_Var = alloca i64, i32 1
store i64 %R1_Arg, i64* %R1_Var
%R2_Var = alloca i64, i32 1
store i64 undef, i64* %R2_Var
%R3_Var = alloca i64, i32 1
store i64 undef, i64* %R3_Var
%R4_Var = alloca i64, i32 1
store i64 undef, i64* %R4_Var
%R5_Var = alloca i64, i32 1
store i64 undef, i64* %R5_Var
%R6_Var = alloca i64, i32 1
store i64 undef, i64* %R6_Var
%SpLim_Var = alloca i64, i32 1
store i64 %SpLim_Arg, i64* %SpLim_Var
%F1_Var = alloca float, i32 1
store float undef, float* %F1_Var
%D1_Var = alloca double, i32 1
store double undef, double* %D1_Var
%XMM1_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1
%F2_Var = alloca float, i32 1
store float undef, float* %F2_Var
%D2_Var = alloca double, i32 1
store double undef, double* %D2_Var
%XMM2_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1
%F3_Var = alloca float, i32 1
store float undef, float* %F3_Var
%D3_Var = alloca double, i32 1
store double undef, double* %D3_Var
%XMM3_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1
%F4_Var = alloca float, i32 1
store float undef, float* %F4_Var
%D4_Var = alloca double, i32 1
store double undef, double* %D4_Var
%XMM4_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1
%F5_Var = alloca float, i32 1
store float undef, float* %F5_Var
%D5_Var = alloca double, i32 1
store double undef, double* %D5_Var
%XMM5_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1
%F6_Var = alloca float, i32 1
store float undef, float* %F6_Var
%D6_Var = alloca double, i32 1
store double undef, double* %D6_Var
%XMM6_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1
%ls1yF = alloca i64, i32 1
%ls1xu = alloca i64, i32 1
%ls1xv = alloca i64, i32 1
%ls1xs = alloca i64, i32 1
%lc1Bn = alloca i64, i32 1
%ls1xH = alloca i64, i32 1
%ls1xX = alloca <4 x i32>, i32 1
%ls1xL = alloca i64, i32 1
%ln1I5 = load i64** %Hp_Var
%ln1I6 = getelementptr inbounds i64* %ln1I5, i32 4
%ln1I7 = ptrtoint i64* %ln1I6 to i64
%ln1I8 = inttoptr i64 %ln1I7 to i64*
store i64* %ln1I8, i64** %Hp_Var
%ln1I9 = load i64* %R1_Var
store i64 %ln1I9, i64* %ls1yF
%ln1Ia = load i64** %Hp_Var
%ln1Ib = ptrtoint i64* %ln1Ia to i64
%ln1Ic = load i64** %Base_Var
%ln1Id = getelementptr inbounds i64* %ln1Ic, i32 35
%ln1Ie = bitcast i64* %ln1Id to i64*
%ln1If = load i64* %ln1Ie, !tbaa !4
%ln1Ig = icmp ugt i64 %ln1Ib, %ln1If
br i1 %ln1Ig, label %c1Cb, label %c1BR
c1BR:
%ln1Ih = load i64* %ls1yF
%ln1Ii = add i64 %ln1Ih, 7
%ln1Ij = inttoptr i64 %ln1Ii to i64*
%ln1Ik = load i64* %ln1Ij, !tbaa !5
store i64 %ln1Ik, i64* %ls1xu
%ln1Il = load i64* %ls1yF
%ln1Im = add i64 %ln1Il, 15
%ln1In = inttoptr i64 %ln1Im to i64*
%ln1Io = load i64* %ln1In, !tbaa !5
store i64 %ln1Io, i64* %ls1xv
%ln1Ip = load i64* %ls1yF
%ln1Iq = add i64 %ln1Ip, 23
%ln1Ir = inttoptr i64 %ln1Iq to i64*
%ln1Is = load i64* %ln1Ir, !tbaa !5
store i64 %ln1Is, i64* %ls1xs
%ln1It = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @s1xB_info to i64
%ln1Iu = load i64** %Hp_Var
%ln1Iv = getelementptr inbounds i64* %ln1Iu, i32 -3
store i64 %ln1It, i64* %ln1Iv, !tbaa !2
%ln1Iw = load i64* %ls1xu
%ln1Ix = load i64** %Hp_Var
%ln1Iy = getelementptr inbounds i64* %ln1Ix, i32 -2
store i64 %ln1Iw, i64* %ln1Iy, !tbaa !2
%ln1Iz = load i64* %ls1xs
%ln1IA = load i64** %Hp_Var
%ln1IB = getelementptr inbounds i64* %ln1IA, i32 -1
store i64 %ln1Iz, i64* %ln1IB, !tbaa !2
%ln1IC = load i64* %ls1xv
%ln1ID = load i64** %Hp_Var
%ln1IE = getelementptr inbounds i64* %ln1ID, i32 0
store i64 %ln1IC, i64* %ln1IE, !tbaa !2
%ln1IF = load i64** %Hp_Var
%ln1IG = ptrtoint i64* %ln1IF to i64
%ln1IH = add i64 %ln1IG, -22
store i64 %ln1IH, i64* %lc1Bn
%ln1II = load i64* %ls1xs
%ln1IJ = load i64* %ls1xs
%ln1IK = srem i64 %ln1IJ, 4
%ln1IL = sub i64 %ln1II, %ln1IK
store i64 %ln1IL, i64* %ls1xH
%ln1IM = insertelement <4 x i32> < i32 0, i32 0, i32 0, i32 0 >, i32 0, i32 0
%ln1IN = insertelement <4 x i32> %ln1IM, i32 0, i32 1
%ln1IO = insertelement <4 x i32> %ln1IN, i32 0, i32 2
%ln1IP = insertelement <4 x i32> %ln1IO, i32 0, i32 3
%ln1IQ = bitcast <4 x i32> %ln1IP to <4 x i32>
store <4 x i32> %ln1IQ, <4 x i32>* %ls1xX, align 1
store i64 0, i64* %ls1xL
br label %s1xV
s1xV:
%ln1IR = load i64* %ls1xL
%ln1IS = load i64* %ls1xH
%ln1IT = icmp sge i64 %ln1IR, %ln1IS
br i1 %ln1IT, label %c1C1, label %c1C0
c1C0:
%ln1IU = load i64* %ls1xu
%ln1IV = add i64 %ln1IU, 16
%ln1IW = load i64* %ls1xv
%ln1IX = load i64* %ls1xL
%ln1IY = add i64 %ln1IW, %ln1IX
%ln1IZ = shl i64 %ln1IY, 2
%ln1J0 = add i64 %ln1IZ, 1536
%ln1J1 = add i64 %ln1IV, %ln1J0
%ln1J2 = inttoptr i64 %ln1J1 to i8*
store i64 undef, i64* %R3_Var
store i64 undef, i64* %R4_Var
store i64 undef, i64* %R5_Var
store i64 undef, i64* %R6_Var
store float undef, float* %F1_Var
store double undef, double* %D1_Var
store float undef, float* %F2_Var
store double undef, double* %D2_Var
store float undef, float* %F3_Var
store double undef, double* %D3_Var
store float undef, float* %F4_Var
store double undef, double* %D4_Var
store float undef, float* %F5_Var
store double undef, double* %D5_Var
store float undef, float* %F6_Var
store double undef, double* %D6_Var
call ccc void (i8*,i32,i32,i32)* @llvm.prefetch( i8* %ln1J2, i32 0, i32 3, i32 1 )
%ln1J3 = load <4 x i32>* %ls1xX, align 1
%ln1J4 = load i64* %ls1xu
%ln1J5 = add i64 %ln1J4, 16
%ln1J6 = load i64* %ls1xv
%ln1J7 = load i64* %ls1xL
%ln1J8 = add i64 %ln1J6, %ln1J7
%ln1J9 = shl i64 %ln1J8, 2
%ln1Ja = add i64 %ln1J5, %ln1J9
%ln1Jb = inttoptr i64 %ln1Ja to <4 x i32>*
%ln1Jc = load <4 x i32>* %ln1Jb, align 1, !tbaa !5
%ln1Jd = add <4 x i32> %ln1J3, %ln1Jc
%ln1Je = bitcast <4 x i32> %ln1Jd to <4 x i32>
store <4 x i32> %ln1Je, <4 x i32>* %ls1xX, align 1
%ln1Jf = load i64* %ls1xL
%ln1Jg = add i64 %ln1Jf, 4
store i64 %ln1Jg, i64* %ls1xL
br label %s1xV
c1C1:
%ln1Jh = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @c1Bm_info to i64
%ln1Ji = load i64** %Sp_Var
%ln1Jj = getelementptr inbounds i64* %ln1Ji, i32 -3
store i64 %ln1Jh, i64* %ln1Jj, !tbaa !1
%ln1Jk = load i64* %ls1xL
store i64 %ln1Jk, i64* %R3_Var
store i64 0, i64* %R2_Var
%ln1Jl = load i64* %lc1Bn
store i64 %ln1Jl, i64* %R1_Var
%ln1Jm = load <4 x i32>* %ls1xX, align 1
%ln1Jn = load i64** %Sp_Var
%ln1Jo = getelementptr inbounds i64* %ln1Jn, i32 -2
%ln1Jp = bitcast i64* %ln1Jo to <4 x i32>*
store <4 x i32> %ln1Jm, <4 x i32>* %ln1Jp, align 1, !tbaa !1
%ln1Jq = load i64** %Sp_Var
%ln1Jr = getelementptr inbounds i64* %ln1Jq, i32 -3
%ln1Js = ptrtoint i64* %ln1Jr to i64
%ln1Jt = inttoptr i64 %ln1Js to i64*
store i64* %ln1Jt, i64** %Sp_Var
%ln1Ju = load i64** %Base_Var
%ln1Jv = load i64** %Sp_Var
%ln1Jw = load i64** %Hp_Var
%ln1Jx = load i64* %R1_Var
%ln1Jy = load i64* %R2_Var
%ln1Jz = load i64* %R3_Var
%ln1JA = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @s1xB_info( i64* %ln1Ju, i64* %ln1Jv, i64* %ln1Jw, i64 %ln1Jx, i64 %ln1Jy, i64 %ln1Jz, i64 undef, i64 undef, i64 undef, i64 %ln1JA ) nounwind
ret void
c1Cb:
%ln1JB = load i64** %Base_Var
%ln1JC = getelementptr inbounds i64* %ln1JB, i32 41
store i64 32, i64* %ln1JC, !tbaa !4
%ln1JD = load i64* %ls1yF
store i64 %ln1JD, i64* %R1_Var
%ln1JE = load i64** %Base_Var
%ln1JF = load i64** %Sp_Var
%ln1JG = load i64** %Hp_Var
%ln1JH = load i64* %R1_Var
%ln1JI = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @stg_gc_unpt_r1( i64* %ln1JE, i64* %ln1JF, i64* %ln1JG, i64 %ln1JH, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1JI ) nounwind
ret void
}
declare ccc void @llvm.prefetch(i8*, i32, i32, i32)
declare cc 10 void @stg_gc_unpt_r1(i64* noalias nocapture, i64* noalias nocapture, i64* noalias nocapture, i64, i64, i64, i64, i64, i64, i64) align 8
%c1Bm_entry_struct = type <{i64, i64}>
@c1Bm_info_itable = internal constant %c1Bm_entry_struct<{i64 451, i64 32}>, section "X98A__STRIP,__me7", align 8
define internal cc 10 void @c1Bm_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me8"
{
c1Bm:
%Base_Var = alloca i64*, i32 1
store i64* %Base_Arg, i64** %Base_Var
%Sp_Var = alloca i64*, i32 1
store i64* %Sp_Arg, i64** %Sp_Var
%Hp_Var = alloca i64*, i32 1
store i64* %Hp_Arg, i64** %Hp_Var
%R1_Var = alloca i64, i32 1
store i64 %R1_Arg, i64* %R1_Var
%R2_Var = alloca i64, i32 1
store i64 undef, i64* %R2_Var
%R3_Var = alloca i64, i32 1
store i64 undef, i64* %R3_Var
%R4_Var = alloca i64, i32 1
store i64 undef, i64* %R4_Var
%R5_Var = alloca i64, i32 1
store i64 undef, i64* %R5_Var
%R6_Var = alloca i64, i32 1
store i64 undef, i64* %R6_Var
%SpLim_Var = alloca i64, i32 1
store i64 %SpLim_Arg, i64* %SpLim_Var
%F1_Var = alloca float, i32 1
store float undef, float* %F1_Var
%D1_Var = alloca double, i32 1
store double undef, double* %D1_Var
%XMM1_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1
%F2_Var = alloca float, i32 1
store float undef, float* %F2_Var
%D2_Var = alloca double, i32 1
store double undef, double* %D2_Var
%XMM2_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1
%F3_Var = alloca float, i32 1
store float undef, float* %F3_Var
%D3_Var = alloca double, i32 1
store double undef, double* %D3_Var
%XMM3_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1
%F4_Var = alloca float, i32 1
store float undef, float* %F4_Var
%D4_Var = alloca double, i32 1
store double undef, double* %D4_Var
%XMM4_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1
%F5_Var = alloca float, i32 1
store float undef, float* %F5_Var
%D5_Var = alloca double, i32 1
store double undef, double* %D5_Var
%XMM5_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1
%F6_Var = alloca float, i32 1
store float undef, float* %F6_Var
%D6_Var = alloca double, i32 1
store double undef, double* %D6_Var
%XMM6_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1
%ls1xX = alloca <4 x i32>, i32 1
%ln1Kr = load i64** %Sp_Var
%ln1Ks = getelementptr inbounds i64* %ln1Kr, i32 1
%ln1Kt = bitcast i64* %ln1Ks to <4 x i32>*
%ln1Ku = load <4 x i32>* %ln1Kt, align 1, !tbaa !1
%ln1Kv = bitcast <4 x i32> %ln1Ku to <4 x i32>
store <4 x i32> %ln1Kv, <4 x i32>* %ls1xX, align 1
%ln1Kw = load i64* %R1_Var
%ln1Kx = load <4 x i32>* %ls1xX, align 1
%ln1Ky = extractelement <4 x i32> %ln1Kx, i32 0
%ln1Kz = sext i32 %ln1Ky to i64
%ln1KA = add i64 %ln1Kw, %ln1Kz
%ln1KB = trunc i64 %ln1KA to i32
%ln1KC = sext i32 %ln1KB to i64
%ln1KD = load <4 x i32>* %ls1xX, align 1
%ln1KE = extractelement <4 x i32> %ln1KD, i32 1
%ln1KF = sext i32 %ln1KE to i64
%ln1KG = add i64 %ln1KC, %ln1KF
%ln1KH = trunc i64 %ln1KG to i32
%ln1KI = sext i32 %ln1KH to i64
%ln1KJ = load <4 x i32>* %ls1xX, align 1
%ln1KK = extractelement <4 x i32> %ln1KJ, i32 2
%ln1KL = sext i32 %ln1KK to i64
%ln1KM = add i64 %ln1KI, %ln1KL
%ln1KN = trunc i64 %ln1KM to i32
%ln1KO = sext i32 %ln1KN to i64
%ln1KP = load <4 x i32>* %ls1xX, align 1
%ln1KQ = extractelement <4 x i32> %ln1KP, i32 3
%ln1KR = sext i32 %ln1KQ to i64
%ln1KS = add i64 %ln1KO, %ln1KR
%ln1KT = trunc i64 %ln1KS to i32
%ln1KU = sext i32 %ln1KT to i64
store i64 %ln1KU, i64* %R1_Var
%ln1KV = load i64** %Sp_Var
%ln1KW = getelementptr inbounds i64* %ln1KV, i32 4
%ln1KX = ptrtoint i64* %ln1KW to i64
%ln1KY = inttoptr i64 %ln1KX to i64*
store i64* %ln1KY, i64** %Sp_Var
%ln1KZ = load i64** %Sp_Var
%ln1L0 = getelementptr inbounds i64* %ln1KZ, i32 0
%ln1L1 = bitcast i64* %ln1L0 to i64*
%ln1L2 = load i64* %ln1L1, !tbaa !1
%ln1L3 = inttoptr i64 %ln1L2 to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*
%ln1L4 = load i64** %Base_Var
%ln1L5 = load i64** %Sp_Var
%ln1L6 = load i64** %Hp_Var
%ln1L7 = load i64* %R1_Var
%ln1L8 = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1L3( i64* %ln1L4, i64* %ln1L5, i64* %ln1L6, i64 %ln1L7, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1L8 ) nounwind
ret void
}
%Test_sum1_entry_struct = type <{i64, i64, i64}>
@Test_sum1_info_itable = constant %Test_sum1_entry_struct<{i64 4294967301, i64 0, i64 15}>, section "X98A__STRIP,__me9", align 8
define cc 10 void @Test_sum1_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me10"
{
c1Dh:
%Base_Var = alloca i64*, i32 1
store i64* %Base_Arg, i64** %Base_Var
%Sp_Var = alloca i64*, i32 1
store i64* %Sp_Arg, i64** %Sp_Var
%Hp_Var = alloca i64*, i32 1
store i64* %Hp_Arg, i64** %Hp_Var
%R1_Var = alloca i64, i32 1
store i64 %R1_Arg, i64* %R1_Var
%R2_Var = alloca i64, i32 1
store i64 %R2_Arg, i64* %R2_Var
%R3_Var = alloca i64, i32 1
store i64 undef, i64* %R3_Var
%R4_Var = alloca i64, i32 1
store i64 undef, i64* %R4_Var
%R5_Var = alloca i64, i32 1
store i64 undef, i64* %R5_Var
%R6_Var = alloca i64, i32 1
store i64 undef, i64* %R6_Var
%SpLim_Var = alloca i64, i32 1
store i64 %SpLim_Arg, i64* %SpLim_Var
%F1_Var = alloca float, i32 1
store float undef, float* %F1_Var
%D1_Var = alloca double, i32 1
store double undef, double* %D1_Var
%XMM1_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1
%F2_Var = alloca float, i32 1
store float undef, float* %F2_Var
%D2_Var = alloca double, i32 1
store double undef, double* %D2_Var
%XMM2_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1
%F3_Var = alloca float, i32 1
store float undef, float* %F3_Var
%D3_Var = alloca double, i32 1
store double undef, double* %D3_Var
%XMM3_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1
%F4_Var = alloca float, i32 1
store float undef, float* %F4_Var
%D4_Var = alloca double, i32 1
store double undef, double* %D4_Var
%XMM4_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1
%F5_Var = alloca float, i32 1
store float undef, float* %F5_Var
%D5_Var = alloca double, i32 1
store double undef, double* %D5_Var
%XMM5_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1
%F6_Var = alloca float, i32 1
store float undef, float* %F6_Var
%D6_Var = alloca double, i32 1
store double undef, double* %D6_Var
%XMM6_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1
%ls1yn = alloca i64, i32 1
%ln1LG = load i64* %R2_Var
store i64 %ln1LG, i64* %ls1yn
%ln1LH = load i64** %Sp_Var
%ln1LI = getelementptr inbounds i64* %ln1LH, i32 -1
%ln1LJ = ptrtoint i64* %ln1LI to i64
%ln1LK = load i64* %SpLim_Var
%ln1LL = icmp ult i64 %ln1LJ, %ln1LK
br i1 %ln1LL, label %c1Dw, label %c1Dv
c1Dv:
%ln1LM = ptrtoint void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)* @c1Di_info to i64
%ln1LN = load i64** %Sp_Var
%ln1LO = getelementptr inbounds i64* %ln1LN, i32 -1
store i64 %ln1LM, i64* %ln1LO, !tbaa !1
%ln1LP = load i64* %ls1yn
store i64 %ln1LP, i64* %R2_Var
%ln1LQ = load i64** %Sp_Var
%ln1LR = getelementptr inbounds i64* %ln1LQ, i32 -1
%ln1LS = ptrtoint i64* %ln1LR to i64
%ln1LT = inttoptr i64 %ln1LS to i64*
store i64* %ln1LT, i64** %Sp_Var
%ln1LU = load i64** %Base_Var
%ln1LV = load i64** %Sp_Var
%ln1LW = load i64** %Hp_Var
%ln1LX = load i64* %R1_Var
%ln1LY = load i64* %R2_Var
%ln1LZ = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @Test_zdwa_info( i64* %ln1LU, i64* %ln1LV, i64* %ln1LW, i64 %ln1LX, i64 %ln1LY, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1LZ ) nounwind
ret void
c1Dw:
%ln1M0 = load i64* %ls1yn
store i64 %ln1M0, i64* %R2_Var
%ln1M1 = ptrtoint %Test_sum1_closure_struct* @Test_sum1_closure to i64
store i64 %ln1M1, i64* %R1_Var
%ln1M2 = load i64** %Base_Var
%ln1M3 = getelementptr inbounds i64* %ln1M2, i32 -1
%ln1M4 = bitcast i64* %ln1M3 to i64*
%ln1M5 = load i64* %ln1M4, !tbaa !4
%ln1M6 = inttoptr i64 %ln1M5 to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*
%ln1M7 = load i64** %Base_Var
%ln1M8 = load i64** %Sp_Var
%ln1M9 = load i64** %Hp_Var
%ln1Ma = load i64* %R1_Var
%ln1Mb = load i64* %R2_Var
%ln1Mc = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1M6( i64* %ln1M7, i64* %ln1M8, i64* %ln1M9, i64 %ln1Ma, i64 %ln1Mb, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Mc ) nounwind
ret void
}
%c1Di_entry_struct = type <{i64, i64}>
@c1Di_info_itable = internal constant %c1Di_entry_struct<{i64 0, i64 32}>, section "X98A__STRIP,__me11", align 8
define internal cc 10 void @c1Di_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me12"
{
c1Di:
%Base_Var = alloca i64*, i32 1
store i64* %Base_Arg, i64** %Base_Var
%Sp_Var = alloca i64*, i32 1
store i64* %Sp_Arg, i64** %Sp_Var
%Hp_Var = alloca i64*, i32 1
store i64* %Hp_Arg, i64** %Hp_Var
%R1_Var = alloca i64, i32 1
store i64 %R1_Arg, i64* %R1_Var
%R2_Var = alloca i64, i32 1
store i64 undef, i64* %R2_Var
%R3_Var = alloca i64, i32 1
store i64 undef, i64* %R3_Var
%R4_Var = alloca i64, i32 1
store i64 undef, i64* %R4_Var
%R5_Var = alloca i64, i32 1
store i64 undef, i64* %R5_Var
%R6_Var = alloca i64, i32 1
store i64 undef, i64* %R6_Var
%SpLim_Var = alloca i64, i32 1
store i64 %SpLim_Arg, i64* %SpLim_Var
%F1_Var = alloca float, i32 1
store float undef, float* %F1_Var
%D1_Var = alloca double, i32 1
store double undef, double* %D1_Var
%XMM1_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1
%F2_Var = alloca float, i32 1
store float undef, float* %F2_Var
%D2_Var = alloca double, i32 1
store double undef, double* %D2_Var
%XMM2_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1
%F3_Var = alloca float, i32 1
store float undef, float* %F3_Var
%D3_Var = alloca double, i32 1
store double undef, double* %D3_Var
%XMM3_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1
%F4_Var = alloca float, i32 1
store float undef, float* %F4_Var
%D4_Var = alloca double, i32 1
store double undef, double* %D4_Var
%XMM4_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1
%F5_Var = alloca float, i32 1
store float undef, float* %F5_Var
%D5_Var = alloca double, i32 1
store double undef, double* %D5_Var
%XMM5_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1
%F6_Var = alloca float, i32 1
store float undef, float* %F6_Var
%D6_Var = alloca double, i32 1
store double undef, double* %D6_Var
%XMM6_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1
%ls1yp = alloca i64, i32 1
%ln1MU = load i64** %Hp_Var
%ln1MV = getelementptr inbounds i64* %ln1MU, i32 2
%ln1MW = ptrtoint i64* %ln1MV to i64
%ln1MX = inttoptr i64 %ln1MW to i64*
store i64* %ln1MX, i64** %Hp_Var
%ln1MY = load i64* %R1_Var
store i64 %ln1MY, i64* %ls1yp
%ln1MZ = load i64** %Hp_Var
%ln1N0 = ptrtoint i64* %ln1MZ to i64
%ln1N1 = load i64** %Base_Var
%ln1N2 = getelementptr inbounds i64* %ln1N1, i32 35
%ln1N3 = bitcast i64* %ln1N2 to i64*
%ln1N4 = load i64* %ln1N3, !tbaa !4
%ln1N5 = icmp ugt i64 %ln1N0, %ln1N4
br i1 %ln1N5, label %c1Ds, label %c1Dp
c1Dp:
%ln1N6 = ptrtoint [0 x i64]* @base_GHCziInt_I32zh_con_info to i64
%ln1N7 = load i64** %Hp_Var
%ln1N8 = getelementptr inbounds i64* %ln1N7, i32 -1
store i64 %ln1N6, i64* %ln1N8, !tbaa !2
%ln1N9 = load i64* %ls1yp
%ln1Na = load i64** %Hp_Var
%ln1Nb = getelementptr inbounds i64* %ln1Na, i32 0
store i64 %ln1N9, i64* %ln1Nb, !tbaa !2
%ln1Nc = load i64** %Hp_Var
%ln1Nd = ptrtoint i64* %ln1Nc to i64
%ln1Ne = add i64 %ln1Nd, -7
store i64 %ln1Ne, i64* %R1_Var
%ln1Nf = load i64** %Sp_Var
%ln1Ng = getelementptr inbounds i64* %ln1Nf, i32 1
%ln1Nh = ptrtoint i64* %ln1Ng to i64
%ln1Ni = inttoptr i64 %ln1Nh to i64*
store i64* %ln1Ni, i64** %Sp_Var
%ln1Nj = load i64** %Sp_Var
%ln1Nk = getelementptr inbounds i64* %ln1Nj, i32 0
%ln1Nl = bitcast i64* %ln1Nk to i64*
%ln1Nm = load i64* %ln1Nl, !tbaa !1
%ln1Nn = inttoptr i64 %ln1Nm to void (i64*, i64*, i64*, i64, i64, i64, i64, i64, i64, i64)*
%ln1No = load i64** %Base_Var
%ln1Np = load i64** %Sp_Var
%ln1Nq = load i64** %Hp_Var
%ln1Nr = load i64* %R1_Var
%ln1Ns = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* %ln1Nn( i64* %ln1No, i64* %ln1Np, i64* %ln1Nq, i64 %ln1Nr, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1Ns ) nounwind
ret void
c1Ds:
%ln1Nt = load i64** %Base_Var
%ln1Nu = getelementptr inbounds i64* %ln1Nt, i32 41
store i64 16, i64* %ln1Nu, !tbaa !4
%ln1Nv = load i64* %ls1yp
store i64 %ln1Nv, i64* %R1_Var
%ln1Nw = load i64** %Base_Var
%ln1Nx = load i64** %Sp_Var
%ln1Ny = load i64** %Hp_Var
%ln1Nz = load i64* %R1_Var
%ln1NA = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @stg_gc_unbx_r1( i64* %ln1Nw, i64* %ln1Nx, i64* %ln1Ny, i64 %ln1Nz, i64 undef, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1NA ) nounwind
ret void
}
@base_GHCziInt_I32zh_con_info = external global [0 x i64]
declare cc 10 void @stg_gc_unbx_r1(i64* noalias nocapture, i64* noalias nocapture, i64* noalias nocapture, i64, i64, i64, i64, i64, i64, i64) align 8
%Test_sum_entry_struct = type <{i64, i64, i64}>
@Test_sum_info_itable = constant %Test_sum_entry_struct<{i64 4294967301, i64 0, i64 15}>, section "X98A__STRIP,__me13", align 8
define cc 10 void @Test_sum_info(i64* noalias nocapture %Base_Arg, i64* noalias nocapture %Sp_Arg, i64* noalias nocapture %Hp_Arg, i64 %R1_Arg, i64 %R2_Arg, i64 %R3_Arg, i64 %R4_Arg, i64 %R5_Arg, i64 %R6_Arg, i64 %SpLim_Arg) align 8 nounwind section "X98A__STRIP,__me14"
{
c1DE:
%Base_Var = alloca i64*, i32 1
store i64* %Base_Arg, i64** %Base_Var
%Sp_Var = alloca i64*, i32 1
store i64* %Sp_Arg, i64** %Sp_Var
%Hp_Var = alloca i64*, i32 1
store i64* %Hp_Arg, i64** %Hp_Var
%R1_Var = alloca i64, i32 1
store i64 %R1_Arg, i64* %R1_Var
%R2_Var = alloca i64, i32 1
store i64 %R2_Arg, i64* %R2_Var
%R3_Var = alloca i64, i32 1
store i64 undef, i64* %R3_Var
%R4_Var = alloca i64, i32 1
store i64 undef, i64* %R4_Var
%R5_Var = alloca i64, i32 1
store i64 undef, i64* %R5_Var
%R6_Var = alloca i64, i32 1
store i64 undef, i64* %R6_Var
%SpLim_Var = alloca i64, i32 1
store i64 %SpLim_Arg, i64* %SpLim_Var
%F1_Var = alloca float, i32 1
store float undef, float* %F1_Var
%D1_Var = alloca double, i32 1
store double undef, double* %D1_Var
%XMM1_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM1_Var, align 1
%F2_Var = alloca float, i32 1
store float undef, float* %F2_Var
%D2_Var = alloca double, i32 1
store double undef, double* %D2_Var
%XMM2_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM2_Var, align 1
%F3_Var = alloca float, i32 1
store float undef, float* %F3_Var
%D3_Var = alloca double, i32 1
store double undef, double* %D3_Var
%XMM3_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM3_Var, align 1
%F4_Var = alloca float, i32 1
store float undef, float* %F4_Var
%D4_Var = alloca double, i32 1
store double undef, double* %D4_Var
%XMM4_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM4_Var, align 1
%F5_Var = alloca float, i32 1
store float undef, float* %F5_Var
%D5_Var = alloca double, i32 1
store double undef, double* %D5_Var
%XMM5_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM5_Var, align 1
%F6_Var = alloca float, i32 1
store float undef, float* %F6_Var
%D6_Var = alloca double, i32 1
store double undef, double* %D6_Var
%XMM6_Var = alloca <4 x i32>, i32 1
store <4 x i32> undef, <4 x i32>* %XMM6_Var, align 1
%ln1NI = load i64* %R2_Var
store i64 %ln1NI, i64* %R2_Var
%ln1NJ = load i64** %Base_Var
%ln1NK = load i64** %Sp_Var
%ln1NL = load i64** %Hp_Var
%ln1NM = load i64* %R1_Var
%ln1NN = load i64* %R2_Var
%ln1NO = load i64* %SpLim_Var
tail call cc 10 void (i64*,i64*,i64*,i64,i64,i64,i64,i64,i64,i64)* @Test_sum1_info( i64* %ln1NJ, i64* %ln1NK, i64* %ln1NL, i64 %ln1NM, i64 %ln1NN, i64 undef, i64 undef, i64 undef, i64 undef, i64 %ln1NO ) nounwind
ret void
}
@llvm.used = appending global [4 x i8*] [i8* bitcast (%c1Di_entry_struct* @c1Di_info_itable to i8*), i8* bitcast (%c1Bm_entry_struct* @c1Bm_info_itable to i8*), i8* bitcast (%c1Bg_entry_struct* @c1Bg_info_itable to i8*), i8* bitcast (%s1xB_entry_struct* @s1xB_info_itable to i8*)], section "llvm.metadata"
-------------- next part --------------
.file "/tmp/ghc19964_0/ghc19964_0.bc"
.data
.type Test_zdwa_closure, at object # @Test_zdwa_closure
.globl Test_zdwa_closure
.align 8
Test_zdwa_closure:
.quad Test_zdwa_info
.size Test_zdwa_closure, 8
.type Test_sum1_closure, at object # @Test_sum1_closure
.globl Test_sum1_closure
.align 8
Test_sum1_closure:
.quad Test_sum1_info
.size Test_sum1_closure, 8
.type Test_sum_closure, at object # @Test_sum_closure
.globl Test_sum_closure
.align 8
Test_sum_closure:
.quad Test_sum_info
.size Test_sum_closure, 8
.section ".note.GNU-stack","", at progbits
.text
.type s1xB_info_itable, at object # @s1xB_info_itable
.align 8
s1xB_info_itable:
.quad 8589934602 # 0x20000000a
.quad 8589934593 # 0x200000001
.quad 9 # 0x9
.size s1xB_info_itable, 24
.text
.align 8, 0x90
.type s1xB_info, at function
s1xB_info: # @s1xB_info
# BB#0: # %c1AJ
movq %r14, %rax
movq 14(%rbx), %rcx
cmpq %rsi, %rcx
jle .LBB0_3
# BB#1: # %c1AM.lr.ph
movq 22(%rbx), %rdx
addq %rsi, %rdx
movq 6(%rbx), %rdi
leaq 16(%rdi,%rdx,4), %rdx
.align 16, 0x90
.LBB0_2: # %c1AM
# =>This Inner Loop Header: Depth=1
addl (%rdx), %eax
movslq %eax, %rax
addq $4, %rdx
incq %rsi
cmpq %rsi, %rcx
jg .LBB0_2
.LBB0_3: # %c1AN
movq (%rbp), %rcx
movq %rax, %rbx
jmpq *%rcx # TAILCALL
.Ltmp0:
.size s1xB_info, .Ltmp0-s1xB_info
.text
.type Test_zdwa_info_itable, at object # @Test_zdwa_info_itable
.globl Test_zdwa_info_itable
.align 8
Test_zdwa_info_itable:
.quad 4294967301 # 0x100000005
.quad 0 # 0x0
.quad 15 # 0xf
.size Test_zdwa_info_itable, 24
.text
.globl Test_zdwa_info
.align 8, 0x90
.type Test_zdwa_info, at function
Test_zdwa_info: # @Test_zdwa_info
# BB#0: # %c1Bf
leaq -32(%rbp), %rax
cmpq %r15, %rax
jae .LBB1_1
# BB#2: # %c1Cf
movq -8(%r13), %rax
movl $Test_zdwa_closure, %ebx
jmpq *%rax # TAILCALL
.LBB1_1: # %c1Ce
movq $c1Bg_info, -8(%rbp)
addq $-8, %rbp
movq %r14, %rbx
jmp stg_ap_0_fast # TAILCALL
.Ltmp1:
.size Test_zdwa_info, .Ltmp1-Test_zdwa_info
.text
.type c1Bg_info_itable, at object # @c1Bg_info_itable
.align 8
c1Bg_info_itable:
.quad 0 # 0x0
.quad 32 # 0x20
.size c1Bg_info_itable, 16
.text
.align 8, 0x90
.type c1Bg_info, at function
c1Bg_info: # @c1Bg_info
# BB#0: # %c1Bg
movq %r12, %rax
leaq 32(%rax), %r12
cmpq 280(%r13), %r12
jbe .LBB2_1
# BB#8: # %c1Cb
movq $32, 328(%r13)
jmp stg_gc_unpt_r1 # TAILCALL
.LBB2_1: # %c1BR
movq 23(%rbx), %rcx
movq 7(%rbx), %rsi
movq 15(%rbx), %rdi
movq $s1xB_info, 8(%rax)
movq %rsi, 16(%rax)
movq %rcx, 24(%rax)
movq %rcx, %rdx
sarq $63, %rdx
shrq $62, %rdx
addq %rcx, %rdx
movq %rdi, (%r12)
andq $-4, %rdx
pxor %xmm0, %xmm0
xorl %eax, %eax
testq %rdx, %rdx
movq %rax, %rcx
jle .LBB2_4
# BB#2: # %c1C0.lr.ph
leaq 1552(%rsi,%rdi,4), %rsi
pxor %xmm0, %xmm0
xorl %ecx, %ecx
.align 16, 0x90
.LBB2_3: # %c1C0
# =>This Inner Loop Header: Depth=1
prefetcht0 (%rsi)
movdqu -1536(%rsi), %xmm1
paddd %xmm1, %xmm0
addq $16, %rsi
addq $4, %rcx
cmpq %rdx, %rcx
jl .LBB2_3
.LBB2_4: # %c1C1
movq $c1Bm_info, -24(%rbp)
movdqu %xmm0, -16(%rbp)
movq -8(%r12), %rdx
cmpq %rcx, %rdx
jle .LBB2_7
# BB#5: # %c1AM.lr.ph.i
subq %rcx, %rdx
addq (%r12), %rcx
movq -16(%r12), %rax
leaq 16(%rax,%rcx,4), %rcx
xorl %eax, %eax
.align 16, 0x90
.LBB2_6: # %c1AM.i
# =>This Inner Loop Header: Depth=1
addl (%rcx), %eax
movslq %eax, %rax
addq $4, %rcx
decq %rdx
jne .LBB2_6
.LBB2_7: # %s1xB_info.exit
pextrd $3, %xmm0, %ecx
addl %eax, %ecx
pextrd $2, %xmm0, %eax
addl %ecx, %eax
pextrd $1, %xmm0, %ecx
addl %eax, %ecx
movd %xmm0, %eax
addl %ecx, %eax
movslq %eax, %rbx
movq 8(%rbp), %rax
addq $8, %rbp
jmpq *%rax # TAILCALL
.Ltmp2:
.size c1Bg_info, .Ltmp2-c1Bg_info
.text
.type c1Bm_info_itable, at object # @c1Bm_info_itable
.align 8
c1Bm_info_itable:
.quad 451 # 0x1c3
.quad 32 # 0x20
.size c1Bm_info_itable, 16
.text
.align 8, 0x90
.type c1Bm_info, at function
c1Bm_info: # @c1Bm_info
# BB#0: # %c1Bm
movdqu 8(%rbp), %xmm0
pextrd $3, %xmm0, %eax
addl %ebx, %eax
pextrd $2, %xmm0, %ecx
addl %eax, %ecx
pextrd $1, %xmm0, %eax
addl %ecx, %eax
movd %xmm0, %ecx
addl %eax, %ecx
movslq %ecx, %rbx
movq 32(%rbp), %rax
addq $32, %rbp
jmpq *%rax # TAILCALL
.Ltmp3:
.size c1Bm_info, .Ltmp3-c1Bm_info
.text
.type Test_sum1_info_itable, at object # @Test_sum1_info_itable
.globl Test_sum1_info_itable
.align 8
Test_sum1_info_itable:
.quad 4294967301 # 0x100000005
.quad 0 # 0x0
.quad 15 # 0xf
.size Test_sum1_info_itable, 24
.text
.globl Test_sum1_info
.align 8, 0x90
.type Test_sum1_info, at function
Test_sum1_info: # @Test_sum1_info
# BB#0: # %c1Dh
leaq -8(%rbp), %rax
cmpq %r15, %rax
jae .LBB4_1
# BB#3: # %c1Dw
movq -8(%r13), %rax
movl $Test_sum1_closure, %ebx
jmpq *%rax # TAILCALL
.LBB4_1: # %c1Dv
movq $c1Di_info, -8(%rbp)
leaq -40(%rbp), %rcx
cmpq %r15, %rcx
jae .LBB4_4
# BB#2: # %c1Cf.i
movq -8(%r13), %rcx
movq %rax, %rbp
movl $Test_zdwa_closure, %ebx
jmpq *%rcx # TAILCALL
.LBB4_4: # %c1Ce.i
movq $c1Bg_info, -16(%rbp)
addq $-16, %rbp
movq %r14, %rbx
jmp stg_ap_0_fast # TAILCALL
.Ltmp4:
.size Test_sum1_info, .Ltmp4-Test_sum1_info
.text
.type c1Di_info_itable, at object # @c1Di_info_itable
.align 8
c1Di_info_itable:
.quad 0 # 0x0
.quad 32 # 0x20
.size c1Di_info_itable, 16
.text
.align 8, 0x90
.type c1Di_info, at function
c1Di_info: # @c1Di_info
# BB#0: # %c1Di
movq %r12, %rax
leaq 16(%rax), %r12
cmpq 280(%r13), %r12
jbe .LBB5_1
# BB#2: # %c1Ds
movq $16, 328(%r13)
jmp stg_gc_unbx_r1 # TAILCALL
.LBB5_1: # %c1Dp
movq $base_GHCziInt_I32zh_con_info, 8(%rax)
movq %rbx, 16(%rax)
movq 8(%rbp), %rax
addq $8, %rbp
leaq -7(%r12), %rbx
jmpq *%rax # TAILCALL
.Ltmp5:
.size c1Di_info, .Ltmp5-c1Di_info
.text
.type Test_sum_info_itable, at object # @Test_sum_info_itable
.globl Test_sum_info_itable
.align 8
Test_sum_info_itable:
.quad 4294967301 # 0x100000005
.quad 0 # 0x0
.quad 15 # 0xf
.size Test_sum_info_itable, 24
.text
.globl Test_sum_info
.align 8, 0x90
.type Test_sum_info, at function
Test_sum_info: # @Test_sum_info
# BB#0: # %c1DE
leaq -8(%rbp), %rax
cmpq %r15, %rax
jae .LBB6_1
# BB#3: # %c1Dw.i
movq -8(%r13), %rax
movl $Test_sum1_closure, %ebx
jmpq *%rax # TAILCALL
.LBB6_1: # %c1Dv.i
movq $c1Di_info, -8(%rbp)
leaq -40(%rbp), %rcx
cmpq %r15, %rcx
jae .LBB6_4
# BB#2: # %c1Cf.i.i
movq -8(%r13), %rcx
movq %rax, %rbp
movl $Test_zdwa_closure, %ebx
jmpq *%rcx # TAILCALL
.LBB6_4: # %c1Ce.i.i
movq $c1Bg_info, -16(%rbp)
addq $-16, %rbp
movq %r14, %rbx
jmp stg_ap_0_fast # TAILCALL
.Ltmp6:
.size Test_sum_info, .Ltmp6-Test_sum_info
.type __stginit_Test, at object # @__stginit_Test
.bss
.globl __stginit_Test
.align 8
__stginit_Test:
.size __stginit_Test, 0
More information about the ghc-devs
mailing list