[Haskell-cafe] Re: A suggestion for the next high profile Haskell project

Lennart Augustsson lennart at augustsson.net
Mon Dec 18 19:22:42 EST 2006


I'm allergic to hex constants.  Surely, they are not necessary.

	-- Lennart

On Dec 18, 2006, at 18:14 , Chris Kuklewicz wrote:

> Hi all (and Don!),
>
>   I have some rewritten versions of readInt below...
>
> Bulat Ziganshin wrote:
>> Hello Donald,
>>
>> Monday, December 18, 2006, 3:51:48 AM, you wrote:
>>>>> Haskell can't provide fast execution speed unless very low-level
>>>>> programming style is used (which is much harder to do in  
>>>>> Haskell than in C,
>>>>> see one of my last messages for example) AND jhc compiler is used
>>
>>> I have to dispute this Bulat's characterisation here. We can  
>>> solve lots
>>> of nice problems and have high performance *right now*. Particularly
>>> concurrency problems, and ones involving streams of bytestrings.
>>> No need to leave the safety of GHC either, nor resort to low  
>>> level evil
>>> code.
>>
>> let's go further in this long-term discussion. i've read Shootout  
>> problems
>> and concluded that there are only 2 tasks which speed is dependent on
>> code-generation abilities of compiler, all other tasks are  
>> dependent on
>> speed of used libraries. just for example - in one test TCL was  
>> fastest
>> language. why? because this test contained almost nothing but 1000  
>> calls to
>> the regex engine with very large strings and TCL regex engine was  
>> fastest
>>
>> the same applies to the two above-mentioned areas - GHC wins in  
>> concurrency
>> tests just because only built-in libraries considered, and GHC has a
>> lightweight threads library built-in while C compilers don't
>>
>> with ByteString library, i know at least one example, where you  
>> have added
>> function - readInt - to the library only to win in Shootout test
>>
>>> This obsession with mutable-variable, imperative code is  
>>> unhealthy, Bulat ;)
>>
>> so why your readInt routine is written in imperative way? ;)
>>
>>
>>>> ajb:
>>>> The PGP format is heavily character stream-based.  We know how  
>>>> horrible
>>>> the performance of character streams are in Haskell.  On one  
>>>> hand, this
>>>> would be an excellent test case.  On the other hand, performance  
>>>> would
>>>> indeed suck now.
>>
>>> Unless you used a stream of lazy bytestrings!
>>> As Duncan did for his pure gzip and bzip2 bindings:
>>
>> these are binding to existing C libs. if you try to convince us  
>> that to get
>> fast FPS routines one need to write them in C and then provide  
>> Haskell
>> binding, i will 100% agree
>>
>> otherwise, please explain me why your own function, readInt, don't  
>> use
>> these fascinating stream fusion capabilities? ;)
>>
>>> P.S.  The comments on this thread makes me think that the state  
>>> of the
>>> art high perf programming in Haskell isn't widely known. Bulat-style
>>> imperative Haskell is rarely (ever?) needed in the GHC 6.6  
>>> Haskell code
>>> I'm writing in these days. Solving large data problems is feasible
>>> *right now* using bytestrings.
>>
>> may be it's me who wrote FPS library? :)  let's agree that high- 
>> performance
>> code can be written in C and then called from Haskell. and when  
>> *you* want
>> to have real performance, you write something very far from your  
>> optimistic
>> words:
>>
>> readInt :: ByteString -> Maybe (Int, ByteString)
>> readInt as
>>     | null as   = Nothing
>>     | otherwise =
>>         case unsafeHead as of
>>             '-' -> loop True  0 0 (unsafeTail as)
>>             '+' -> loop False 0 0 (unsafeTail as)
>>             _   -> loop False 0 0 as
>>
>>     where loop :: Bool -> Int -> Int -> ByteString -> Maybe (Int,  
>> ByteString)
>>           STRICT4(loop)
>>           loop neg i n ps
>>               | null ps   = end neg i n ps
>>               | otherwise =
>>                   case B.unsafeHead ps of
>>                     w | w >= 0x30
>>                      && w <= 0x39 -> loop neg (i+1)
>>                                           (n * 10 + (fromIntegral  
>> w - 0x30))
>>                                           (unsafeTail ps)
>>                       | otherwise -> end neg i n ps
>>
>>           end _    0 _ _  = Nothing
>>           end True _ n ps = Just (negate n, ps)
>>           end _    _ n ps = Just (n, ps)
>>
>>
>>> Trying to write code using an imperative style is likely to do  
>>> more harm
>>> than good, and certainly not something to suggest to beginners on  
>>> the
>>> mailing list ;)
>>
>> of course, if you want to fool beginners, you can continue to sing  
>> these
>> songs :D
>>
>
> That code for readInt has no mutable state or IO, and it may be  
> imperative
> style, it is not much like c-code.
>
> I liked my readInt1 version below, which actually benchmarks faster  
> than the GHC
> 6.6 Data.ByteString.Char8.readInt (on the little "test r" below).   
> What a
> pleasant surprise.
>
> The readInt2 version below is even more functional, but takes about  
> 1.7 times as
> long to run "test r".  (gaged from running "time ./ReadInt.2" on  
> Mac OS X on a
> G4 cpu).  The latest fusion in darcs fps might be better...
>
>> module Main(main) where
>>
>> import Control.Arrow((***))
>> import Data.Char(chr,ord)
>> import Data.ByteString(ByteString)
>> import qualified Data.ByteString as B(null,foldl',span)
>> import qualified Data.ByteString.Char8 as C (readInt,pack)
>> import qualified Data.ByteString.Base as B(unsafeHead,unsafeTail)
>> import Data.Ix(inRange)
>> import Data.Maybe(fromJust)
>> import Data.Word(Word8)
>>
>> default ()
>>
>> {-# INLINE chr8 #-}
>> chr8 :: Word8 -> Char
>> chr8 = chr . fromEnum
>> {-# INLINE ord8 #-}
>> ord8 :: Char -> Word8
>> ord8 = toEnum . ord
>>
>> {-# INLINE decompose #-}
>> decompose :: a -> (Word8 -> ByteString -> a) -> ByteString -> a
>> decompose whenNull withHeadTail bs =
>>   if B.null bs then whenNull else (withHeadTail $! (B.unsafeHead  
>> bs)) $! (B.unsafeTail bs)
>>
>> -- This does not do any bound checking if the Int overflows
>> readInt1 :: ByteString -> Maybe (Int, ByteString)
>> readInt1 bs = decompose Nothing (\h t ->
>>     case h of
>>       0x2d -> fmap (negate *** id) $ first t -- '-'
>>       0x2b -> first t                        -- '+'
>>       _    -> first bs) bs
>>   where first :: ByteString -> Maybe (Int, ByteString)
>>         first = decompose Nothing (\h t ->
>>             if inRange (0x30,0x39) h
>>               then Just (loop t $! (fromIntegral h-0x30))
>>               else Nothing)
>>         loop :: ByteString -> Int -> (Int, ByteString)
>>         loop bs n = decompose (n,bs) (\h t ->
>>             if inRange (0x30,0x39) h
>>               then loop t $! (n*10+(fromIntegral h-0x30))
>>               else (n,bs)) bs
>>
>>
>> readInt2 :: ByteString -> Maybe (Int,ByteString)
>> readInt2 bs =  decompose Nothing (\h t ->
>>     case h of
>>       0x2d -> fmap (negate *** id) $ toInt t -- '-'
>>       0x2b -> toInt t                        -- '+'
>>       _    -> toInt bs) bs
>>   where toInt :: ByteString -> Maybe (Int,ByteString)
>>         toInt bs = let (digits,rest) = B.span (inRange  
>> (0x30,0x39)) bs
>>                    in if B.null digits then Nothing else Just  
>> (convert digits,rest)
>>         convert :: ByteString -> Int
>>         convert = B.foldl' (\n h -> n*10 + (fromIntegral h - 0x30)) 0
>>
>> test r = let a = take 1000000 $ cycle [C.pack "13247897",C.pack  
>> "-13247896"]
>>          in (sum . map (fst . fromJust . r) $ a
>>             , take 4 a
>>             , map r $ take 4 a)
>>
>> main = print $ test $ (readInt2)
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe



More information about the Haskell-Cafe mailing list