<div dir="ltr">OMG. Yes!<div><br></div><div>I just needed to add the -msse4.2 flag to GHC:<br><div><br></div><div><div>benchmarking Rank/PopCnt1 Broadword - Once</div><div>time 18.45 ms (18.25 ms .. 18.67 ms)</div><div> 0.999 R² (0.997 R² .. 0.999 R²)</div><div>mean 18.19 ms (17.99 ms .. 18.38 ms)</div><div>std dev 508.6 μs (398.6 μs .. 623.8 μs)</div><div><br></div><div>benchmarking Rank/PopCnt1 GHC - Once</div><div>time 11.82 ms (11.65 ms .. 11.97 ms)</div><div> 0.999 R² (0.998 R² .. 0.999 R²)</div><div>mean 11.85 ms (11.74 ms .. 11.96 ms)</div><div>std dev 283.5 μs (229.6 μs .. 362.8 μs)</div></div><div><br></div><div>Thanks so much for your help!</div></div><div><br></div><div>Cheers,</div><div><br></div><div>-John</div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, 24 Mar 2016 at 08:36 John Ky <<a href="mailto:newhoggy@gmail.com">newhoggy@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Rahul,<div><br></div><div>The reason I thought that maybe GHC didn't call the CPU instruction was due to my benchmarking, which showed that the built-in popCount was slower than the pure Haskell Broadword implementation:<br><br><div>main = defaultMain</div><div> [ env setupEnv (\bv -> bgroup "Rank"</div><div> [ bench "PopCnt1 Broadword - Once" (nf (map (\n -> getCount (PC1BW.popCount1 (DVS.take n bv)))) [0, 1000..100000])</div><div> , bench "PopCnt1 GHC - Once" (nf (map (\n -> getCount (PC1GHC.popCount1 (DVS.take n bv)))) [0, 1000..100000])</div></div><div> ] )</div><div> ]</div><div><br></div><div>Benchmarking results follow:</div><div><br></div><div><div>benchmarking Rank/PopCnt1 Broadword - Once</div><div>time 18.49 ms (17.89 ms .. 19.08 ms)</div><div> 0.994 R² (0.989 R² .. 0.998 R²)</div><div>mean 19.62 ms (19.22 ms .. 20.04 ms)</div><div>std dev 1.026 ms (846.2 μs .. 1.315 ms)</div><div>variance introduced by outliers: 21% (moderately inflated)</div><div><br></div><div>benchmarking Rank/PopCnt1 GHC - Once</div><div>time 36.80 ms (36.25 ms .. 37.57 ms)</div><div> 0.998 R² (0.996 R² .. 1.000 R²)</div><div>mean 37.14 ms (36.72 ms .. 37.97 ms)</div><div>std dev 1.100 ms (650.6 μs .. 1.680 ms)</div></div><div><br></div><div>This makes the built-in nearly two times slower than the pure Haskell. That's how I got into the ghc-prim code.</div><div><br></div><div>This is the broadword implementation I was using:</div><div><br></div><div><div> popCount1 x0 = ((x3 * 0x0101010101010101) .>. 56)</div><div> where</div><div> x1 = x0 - ((x0 .&. 0xaaaaaaaaaaaaaaaa) .>. 1)</div><div> x2 = (x1 .&. 0x3333333333333333) + ((x1 .>. 2) .&. 0x3333333333333333)</div><div> x3 = (x2 + (x2 .>. 4)) .&. 0x0f0f0f0f0f0f0f0f</div></div><div><br></div><div>Maybe all I need to do is compile with some flags or switch the backend or something? Any ideas?</div><div><br></div><div>Cheers,</div><div><br></div><div>-John</div><div><br></div></div><br><div class="gmail_quote"><div dir="ltr">On Thu, 24 Mar 2016 at 08:28 John Ky <<a href="mailto:newhoggy@gmail.com" target="_blank">newhoggy@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div dir="ltr">Hi Rahul,<div><br></div><div>I see mention of the popcount instruction in nativeGen/X86/CodeGen.hs. In particular it looks like it only activates if sse4_2 is activated? Maybe all I need to do is activate this somehow?</div><div><br></div><div>Cheers,</div><div><br></div><div>-John</div></div><div dir="ltr"><div><br><div class="gmail_quote"><div dir="ltr">On Wed, 23 Mar 2016 at 12:07 <<a href="mailto:rahulmutt@gmail.com" target="_blank">rahulmutt@gmail.com</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div lang="en-GB" style="background-color:rgb(255,255,255);line-height:initial"> <div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">Hi John,</div><div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)"><br></div><div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">ghc-prim is just a stub package generated for the purpose of documentation. All primops are defined in a low level language called Cmm in GHC. If you want to make it even faster, you'll need to learn Cmm and update the definition in GHC. If you want to a specialized implementation for x86 systems, you may need to modify the NCG (Native Code Generator) which requires a knowledge of assembly language.</div><div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)"><br></div><div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">Hope that helps!</div><div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">Rahul Muttineni</div> <div style="width:100%;font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)"><br style="display:initial"></div> <div style="font-size:initial;font-family:Calibri,'Slate Pro',sans-serif,sans-serif;color:rgb(31,73,125);text-align:initial;background-color:rgb(255,255,255)">Sent from my BlackBerry 10 smartphone.</div> <table width="100%" style="background-color:white;border-spacing:0px"> <tbody><tr><td colspan="2" style="font-size:initial;text-align:initial;background-color:rgb(255,255,255)"> <div style="border-style:solid none none;border-top-color:rgb(181,196,223);border-top-width:1pt;padding:3pt 0in 0in;font-family:Tahoma,'BB Alpha Sans','Slate Pro';font-size:10pt"> <div><b>From: </b>John Ky</div><div><b>Sent: </b>Wednesday 23 March 2016 4:40 AM</div><div><b>To: </b>The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell</div><div><b>Reply To: </b>The Haskell-Beginners Mailing List - Discussion of primarily beginner-level topics related to Haskell</div><div><b>Subject: </b>[Haskell-beginners] let x = x in x (GHC.Prim)</div></div></td></tr></tbody></table></div><div lang="en-GB" style="background-color:rgb(255,255,255);line-height:initial"><div style="border-style:solid none none;border-top-color:rgb(186,188,209);border-top-width:1pt;font-size:initial;text-align:initial;background-color:rgb(255,255,255)"></div><br><div><div dir="ltr">Hello Haskellers,<div><br></div><div>I'm trying to write a faster popCount function for x86 systems.<br></div><div><br></div><div>I tried cloning the ghc-prim package and repurposing it for my own needs, but it isn't working as hoped.</div><div><br></div><div>In particular, popCnt64# was implemented in GHC.Prim as:</div><div><br></div><div>popCnt64# = let x = x in x<br></div><div><br></div><div>Which shouldn't terminate. Yet when I call it, it magically finds the C implementation in hs_popcnt64 and returns the correct value.</div><div><br></div><div>My cloned project doesn't behave that way. Instead it doesn't terminate as I would expect.</div><div><br></div><div>Anyone know what's happening here, if there is a way to make this work or tell me if I'm going about this completely the wrong way?</div><div><br></div><div>Cheers,</div><div><br></div><div>-John</div><div><br></div></div>
<br></div></div><div lang="en-GB" style="background-color:rgb(255,255,255);line-height:initial"></div>
_______________________________________________<br>
Beginners mailing list<br>
<a href="mailto:Beginners@haskell.org" target="_blank">Beginners@haskell.org</a><br>
<a href="http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners" rel="noreferrer" target="_blank">http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners</a><br>
</blockquote></div></div></div></blockquote></div></blockquote></div>