[web-devel] Data.Word8 (word8 library)

Gregory Collins greg at gregorycollins.net
Thu Sep 20 16:47:47 CEST 2012


This is, of course, not an apples-to-apples test:

Prelude Data.Char> toUpper 'χ'
'\935'
Prelude Data.Char> putStrLn ('\935':[])
Χ


...which I suppose is the point. I wonder whether a version of
toUpper/toLower on Char restricted to ASCII values would have the same
performance here.

We only call toLower explicitly in one place in snap-server, but where this
would be nice to fix is for HTTP headers, where I think we are all using
case-insensitive (which just calls "map toLower"). Probably we should send
Bas a patch to optimize the FoldCase instance for ByteString.

Personally I would prefer not to have yet another tiny package here, as the
package zoo has enough creatures in it as it is. Do we think we have a real
problem here beyond the toUpper/toLower case? I suspect that for most other
uses of Data.ByteString.Char8 the conversion is a no-op.

G

On Thu, Sep 20, 2012 at 4:17 PM, Michael Snoyman <michael at snoyman.com>wrote:

> On Thu, Sep 20, 2012 at 2:10 PM, Michael Snoyman <michael at snoyman.com>
> wrote:
> > On Thu, Sep 20, 2012 at 11:41 AM, Kazu Yamamoto <kazu at iij.ad.jp> wrote:
> >> Hello,
> >>
> >> ByteString is an array of Word8 but it seems to me that people tend to
> >> use the Char interface with Data.ByteString.Char8 instead of Word8
> >> interface with Data.ByteString. Since the functions defined in
> >> Data.ByteString.Char8 converts Word8 to Char and Char to Word8, it has
> >> unnecessary overhead. Yes, the overhead is ignorable in many cases,
> >> but I would like to remove it for high performance server.
> >>
> >> Why do people use Data.ByteString.Char8? I guess that there are two
> >> reasons:
> >>
> >> - There are no standard utility functions for Word8 such as "isUpper"
> >> - Numeric literal (e.g 72 for 'H') is not readable
> >>
> >> To fix these problems, I implemented the Data.Word8 module and
> >> uploaded the word8 library to Hackage:
> >>
> >>
> http://hackage.haskell.org/packages/archive/word8/0.0.0/doc/html/Data-Word8.html
> >>
> >> If Michael and Bas like this, I would like to modify warp and
> >> case-insensitive to use the word8 library. What do people think this?
> >>
> >> My concern is that character names start with "_". Some people would
> >> dislike this convention. But I have not a better idea at this moment.
> >> Suggestions are welcome.
> >>
> >> --Kazu
> >>
> >> _______________________________________________
> >> web-devel mailing list
> >> web-devel at haskell.org
> >> http://www.haskell.org/mailman/listinfo/web-devel
> >
> > Sounds good to me. I put together a simple benchmark to compare the
> > performance of toLower, and the results are encouraging:
> >
> > benchmarking Char8
> > mean: 38.04527 us, lb 37.94080 us, ub 38.12774 us, ci 0.950
> > std dev: 470.9770 ns, lb 364.8254 ns, ub 748.3015 ns, ci 0.950
> >
> > benchmarking Word8
> > mean: 4.807265 us, lb 4.798199 us, ub 4.816563 us, ci 0.950
> > std dev: 47.20958 ns, lb 41.51181 ns, ub 55.07049 ns, ci 0.950
> >
> > I want to try throwing one more idea into the mix, I'll post with
> > updates when I have them.
> >
> > So to answer your question: I'd be happy to include word8 in warp :).
> >
> > Michael
> >
> >
> > {-# LANGUAGE OverloadedStrings #-}
> > import Criterion.Main
> > import qualified Data.ByteString as S
> > import qualified Data.ByteString.Char8 as S8
> > import qualified Data.Char
> > import qualified Data.Word8
> >
> > main :: IO ()
> > main = do
> >     input <- S.readFile "bench.hs"
> >     defaultMain
> >         [ bench "Char8" $ whnf (S.length . S8.map Data.Char.toLower)
> input
> >         , bench "Word8" $ whnf (S.length . S.map Data.Word8.toLower)
> input
> >         ]
>
> I tried implementing a more low-level approach to try and avoid the
> Word8 boxing. The results improved a bit, but not significantly:
>
>
> benchmarking Char8
> mean: 318.2341 us, lb 314.5367 us, ub 320.4834 us, ci 0.950
> std dev: 14.48230 us, lb 10.00946 us, ub 21.22126 us, ci 0.950
> found 9 outliers among 100 samples (9.0%)
>   8 (8.0%) low severe
> variance introduced by outliers: 43.472%
> variance is moderately inflated by outliers
>
> benchmarking Word8
> mean: 35.79037 us, lb 35.66547 us, ub 35.92601 us, ci 0.950
> std dev: 665.5299 ns, lb 599.3413 ns, ub 741.6474 ns, ci 0.950
> variance introduced by outliers: 11.349%
> variance is moderately inflated by outliers
>
> benchmarking bsToLower
> mean: 31.49299 us, lb 31.32314 us, ub 31.65027 us, ci 0.950
> std dev: 835.2251 ns, lb 744.4337 ns, ub 946.1789 ns, ci 0.950
> variance introduced by outliers: 20.925%
> variance is moderately inflated by outliers
>
> Perhaps someone with more experience with this level of optimization
> would be able to improve the algorithm:
>
> https://gist.github.com/3756212
>
> Michael
>
> _______________________________________________
> web-devel mailing list
> web-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/web-devel
>



-- 
Gregory Collins <greg at gregorycollins.net>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/web-devel/attachments/20120920/ff26e282/attachment.htm>


More information about the web-devel mailing list