[Haskell-beginners] let x = x in x (GHC.Prim)

John Ky newhoggy at gmail.com
Wed Mar 23 21:41:06 UTC 2016


OMG. Yes!

I just needed to add the  -msse4.2 flag to GHC:

benchmarking Rank/PopCnt1 Broadword - Once
time                 18.45 ms   (18.25 ms .. 18.67 ms)
                     0.999 R²   (0.997 R² .. 0.999 R²)
mean                 18.19 ms   (17.99 ms .. 18.38 ms)
std dev              508.6 μs   (398.6 μs .. 623.8 μs)

benchmarking Rank/PopCnt1 GHC       - Once
time                 11.82 ms   (11.65 ms .. 11.97 ms)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 11.85 ms   (11.74 ms .. 11.96 ms)
std dev              283.5 μs   (229.6 μs .. 362.8 μs)

Thanks so much for your help!

Cheers,

-John

On Thu, 24 Mar 2016 at 08:36 John Ky <newhoggy at gmail.com> wrote:

> Hi Rahul,
>
> The reason I thought that maybe GHC didn't call the CPU instruction was
> due to my benchmarking, which showed that the built-in popCount was slower
> than the pure Haskell Broadword implementation:
>
> main = defaultMain
>   [ env setupEnv (\bv -> bgroup "Rank"
>     [ bench "PopCnt1 Broadword - Once" (nf   (map (\n -> getCount
> (PC1BW.popCount1  (DVS.take n bv)))) [0, 1000..100000])
>     , bench "PopCnt1 GHC       - Once" (nf   (map (\n -> getCount
> (PC1GHC.popCount1 (DVS.take n bv)))) [0, 1000..100000])
>     ] )
>   ]
>
> Benchmarking results follow:
>
> benchmarking Rank/PopCnt1 Broadword - Once
> time                 18.49 ms   (17.89 ms .. 19.08 ms)
>                      0.994 R²   (0.989 R² .. 0.998 R²)
> mean                 19.62 ms   (19.22 ms .. 20.04 ms)
> std dev              1.026 ms   (846.2 μs .. 1.315 ms)
> variance introduced by outliers: 21% (moderately inflated)
>
> benchmarking Rank/PopCnt1 GHC       - Once
> time                 36.80 ms   (36.25 ms .. 37.57 ms)
>                      0.998 R²   (0.996 R² .. 1.000 R²)
> mean                 37.14 ms   (36.72 ms .. 37.97 ms)
> std dev              1.100 ms   (650.6 μs .. 1.680 ms)
>
> This makes the built-in nearly two times slower than the pure Haskell.
> That's how I got into the ghc-prim code.
>
> This is the broadword implementation I was using:
>
>   popCount1 x0 = ((x3 * 0x0101010101010101) .>. 56)
>     where
>       x1 = x0 - ((x0 .&. 0xaaaaaaaaaaaaaaaa) .>. 1)
>       x2 = (x1 .&. 0x3333333333333333) + ((x1 .>. 2) .&.
> 0x3333333333333333)
>       x3 = (x2 + (x2 .>. 4)) .&. 0x0f0f0f0f0f0f0f0f
>
> Maybe all I need to do is compile with some flags or switch the backend or
> something?  Any ideas?
>
> Cheers,
>
> -John
>
>
> On Thu, 24 Mar 2016 at 08:28 John Ky <newhoggy at gmail.com> wrote:
>
>> Hi Rahul,
>>
>> I see mention of the popcount instruction in nativeGen/X86/CodeGen.hs.
>> In particular it looks like it only activates if sse4_2 is activated?
>> Maybe all I need to do is activate this somehow?
>>
>> Cheers,
>>
>> -John
>>
>> On Wed, 23 Mar 2016 at 12:07 <rahulmutt at gmail.com> wrote:
>>
>>> Hi John,
>>>
>>> ghc-prim is just a stub package generated for the purpose of
>>> documentation. All primops are defined in a low level language called Cmm
>>> in GHC. If you want to make it even faster, you'll need to learn Cmm and
>>> update the definition in GHC. If you want to a specialized implementation
>>> for x86 systems, you may need to modify the NCG (Native Code Generator)
>>> which requires a knowledge of assembly language.
>>>
>>> Hope that helps!
>>> Rahul Muttineni
>>>
>>> Sent from my BlackBerry 10 smartphone.
>>> *From: *John Ky
>>> *Sent: *Wednesday 23 March 2016 4:40 AM
>>> *To: *The Haskell-Beginners Mailing List - Discussion of primarily
>>> beginner-level topics related to Haskell
>>> *Reply To: *The Haskell-Beginners Mailing List - Discussion of
>>> primarily beginner-level topics related to Haskell
>>> *Subject: *[Haskell-beginners] let x = x in x (GHC.Prim)
>>>
>>> Hello Haskellers,
>>>
>>> I'm trying to write a faster popCount function for x86 systems.
>>>
>>> I tried cloning the ghc-prim package and repurposing it for my own
>>> needs, but it isn't working as hoped.
>>>
>>> In particular, popCnt64# was implemented in GHC.Prim as:
>>>
>>> popCnt64# = let x = x in x
>>>
>>> Which shouldn't terminate.  Yet when I call it, it magically finds the C
>>> implementation in hs_popcnt64 and returns the correct value.
>>>
>>> My cloned project doesn't behave that way.  Instead it doesn't terminate
>>> as I would expect.
>>>
>>> Anyone know what's happening here, if there is a way to make this work
>>> or tell me if I'm going about this completely the wrong way?
>>>
>>> Cheers,
>>>
>>> -John
>>>
>>>
>>> _______________________________________________
>>> Beginners mailing list
>>> Beginners at haskell.org
>>> http://mail.haskell.org/cgi-bin/mailman/listinfo/beginners
>>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20160323/5ea9bcec/attachment-0001.html>


More information about the Beginners mailing list