[web-devel] Data.Word8 (word8 library)

Michael Snoyman michael at snoyman.com
Thu Sep 20 16:17:28 CEST 2012


On Thu, Sep 20, 2012 at 2:10 PM, Michael Snoyman <michael at snoyman.com> wrote:
> On Thu, Sep 20, 2012 at 11:41 AM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
>> Hello,
>>
>> ByteString is an array of Word8 but it seems to me that people tend to
>> use the Char interface with Data.ByteString.Char8 instead of Word8
>> interface with Data.ByteString. Since the functions defined in
>> Data.ByteString.Char8 converts Word8 to Char and Char to Word8, it has
>> unnecessary overhead. Yes, the overhead is ignorable in many cases,
>> but I would like to remove it for high performance server.
>>
>> Why do people use Data.ByteString.Char8? I guess that there are two
>> reasons:
>>
>> - There are no standard utility functions for Word8 such as "isUpper"
>> - Numeric literal (e.g 72 for 'H') is not readable
>>
>> To fix these problems, I implemented the Data.Word8 module and
>> uploaded the word8 library to Hackage:
>>
>>         http://hackage.haskell.org/packages/archive/word8/0.0.0/doc/html/Data-Word8.html
>>
>> If Michael and Bas like this, I would like to modify warp and
>> case-insensitive to use the word8 library. What do people think this?
>>
>> My concern is that character names start with "_". Some people would
>> dislike this convention. But I have not a better idea at this moment.
>> Suggestions are welcome.
>>
>> --Kazu
>>
>> _______________________________________________
>> web-devel mailing list
>> web-devel at haskell.org
>> http://www.haskell.org/mailman/listinfo/web-devel
>
> Sounds good to me. I put together a simple benchmark to compare the
> performance of toLower, and the results are encouraging:
>
> benchmarking Char8
> mean: 38.04527 us, lb 37.94080 us, ub 38.12774 us, ci 0.950
> std dev: 470.9770 ns, lb 364.8254 ns, ub 748.3015 ns, ci 0.950
>
> benchmarking Word8
> mean: 4.807265 us, lb 4.798199 us, ub 4.816563 us, ci 0.950
> std dev: 47.20958 ns, lb 41.51181 ns, ub 55.07049 ns, ci 0.950
>
> I want to try throwing one more idea into the mix, I'll post with
> updates when I have them.
>
> So to answer your question: I'd be happy to include word8 in warp :).
>
> Michael
>
>
> {-# LANGUAGE OverloadedStrings #-}
> import Criterion.Main
> import qualified Data.ByteString as S
> import qualified Data.ByteString.Char8 as S8
> import qualified Data.Char
> import qualified Data.Word8
>
> main :: IO ()
> main = do
>     input <- S.readFile "bench.hs"
>     defaultMain
>         [ bench "Char8" $ whnf (S.length . S8.map Data.Char.toLower) input
>         , bench "Word8" $ whnf (S.length . S.map Data.Word8.toLower) input
>         ]

I tried implementing a more low-level approach to try and avoid the
Word8 boxing. The results improved a bit, but not significantly:


benchmarking Char8
mean: 318.2341 us, lb 314.5367 us, ub 320.4834 us, ci 0.950
std dev: 14.48230 us, lb 10.00946 us, ub 21.22126 us, ci 0.950
found 9 outliers among 100 samples (9.0%)
  8 (8.0%) low severe
variance introduced by outliers: 43.472%
variance is moderately inflated by outliers

benchmarking Word8
mean: 35.79037 us, lb 35.66547 us, ub 35.92601 us, ci 0.950
std dev: 665.5299 ns, lb 599.3413 ns, ub 741.6474 ns, ci 0.950
variance introduced by outliers: 11.349%
variance is moderately inflated by outliers

benchmarking bsToLower
mean: 31.49299 us, lb 31.32314 us, ub 31.65027 us, ci 0.950
std dev: 835.2251 ns, lb 744.4337 ns, ub 946.1789 ns, ci 0.950
variance introduced by outliers: 20.925%
variance is moderately inflated by outliers

Perhaps someone with more experience with this level of optimization
would be able to improve the algorithm:

https://gist.github.com/3756212

Michael



More information about the web-devel mailing list