[Haskell-cafe] Re: A suggestion for the next high profile Haskell
project
Lennart Augustsson
lennart at augustsson.net
Mon Dec 18 19:22:42 EST 2006
I'm allergic to hex constants. Surely, they are not necessary.
-- Lennart
On Dec 18, 2006, at 18:14 , Chris Kuklewicz wrote:
> Hi all (and Don!),
>
> I have some rewritten versions of readInt below...
>
> Bulat Ziganshin wrote:
>> Hello Donald,
>>
>> Monday, December 18, 2006, 3:51:48 AM, you wrote:
>>>>> Haskell can't provide fast execution speed unless very low-level
>>>>> programming style is used (which is much harder to do in
>>>>> Haskell than in C,
>>>>> see one of my last messages for example) AND jhc compiler is used
>>
>>> I have to dispute this Bulat's characterisation here. We can
>>> solve lots
>>> of nice problems and have high performance *right now*. Particularly
>>> concurrency problems, and ones involving streams of bytestrings.
>>> No need to leave the safety of GHC either, nor resort to low
>>> level evil
>>> code.
>>
>> let's go further in this long-term discussion. i've read Shootout
>> problems
>> and concluded that there are only 2 tasks which speed is dependent on
>> code-generation abilities of compiler, all other tasks are
>> dependent on
>> speed of used libraries. just for example - in one test TCL was
>> fastest
>> language. why? because this test contained almost nothing but 1000
>> calls to
>> the regex engine with very large strings and TCL regex engine was
>> fastest
>>
>> the same applies to the two above-mentioned areas - GHC wins in
>> concurrency
>> tests just because only built-in libraries considered, and GHC has a
>> lightweight threads library built-in while C compilers don't
>>
>> with ByteString library, i know at least one example, where you
>> have added
>> function - readInt - to the library only to win in Shootout test
>>
>>> This obsession with mutable-variable, imperative code is
>>> unhealthy, Bulat ;)
>>
>> so why your readInt routine is written in imperative way? ;)
>>
>>
>>>> ajb:
>>>> The PGP format is heavily character stream-based. We know how
>>>> horrible
>>>> the performance of character streams are in Haskell. On one
>>>> hand, this
>>>> would be an excellent test case. On the other hand, performance
>>>> would
>>>> indeed suck now.
>>
>>> Unless you used a stream of lazy bytestrings!
>>> As Duncan did for his pure gzip and bzip2 bindings:
>>
>> these are binding to existing C libs. if you try to convince us
>> that to get
>> fast FPS routines one need to write them in C and then provide
>> Haskell
>> binding, i will 100% agree
>>
>> otherwise, please explain me why your own function, readInt, don't
>> use
>> these fascinating stream fusion capabilities? ;)
>>
>>> P.S. The comments on this thread makes me think that the state
>>> of the
>>> art high perf programming in Haskell isn't widely known. Bulat-style
>>> imperative Haskell is rarely (ever?) needed in the GHC 6.6
>>> Haskell code
>>> I'm writing in these days. Solving large data problems is feasible
>>> *right now* using bytestrings.
>>
>> may be it's me who wrote FPS library? :) let's agree that high-
>> performance
>> code can be written in C and then called from Haskell. and when
>> *you* want
>> to have real performance, you write something very far from your
>> optimistic
>> words:
>>
>> readInt :: ByteString -> Maybe (Int, ByteString)
>> readInt as
>> | null as = Nothing
>> | otherwise =
>> case unsafeHead as of
>> '-' -> loop True 0 0 (unsafeTail as)
>> '+' -> loop False 0 0 (unsafeTail as)
>> _ -> loop False 0 0 as
>>
>> where loop :: Bool -> Int -> Int -> ByteString -> Maybe (Int,
>> ByteString)
>> STRICT4(loop)
>> loop neg i n ps
>> | null ps = end neg i n ps
>> | otherwise =
>> case B.unsafeHead ps of
>> w | w >= 0x30
>> && w <= 0x39 -> loop neg (i+1)
>> (n * 10 + (fromIntegral
>> w - 0x30))
>> (unsafeTail ps)
>> | otherwise -> end neg i n ps
>>
>> end _ 0 _ _ = Nothing
>> end True _ n ps = Just (negate n, ps)
>> end _ _ n ps = Just (n, ps)
>>
>>
>>> Trying to write code using an imperative style is likely to do
>>> more harm
>>> than good, and certainly not something to suggest to beginners on
>>> the
>>> mailing list ;)
>>
>> of course, if you want to fool beginners, you can continue to sing
>> these
>> songs :D
>>
>
> That code for readInt has no mutable state or IO, and it may be
> imperative
> style, it is not much like c-code.
>
> I liked my readInt1 version below, which actually benchmarks faster
> than the GHC
> 6.6 Data.ByteString.Char8.readInt (on the little "test r" below).
> What a
> pleasant surprise.
>
> The readInt2 version below is even more functional, but takes about
> 1.7 times as
> long to run "test r". (gaged from running "time ./ReadInt.2" on
> Mac OS X on a
> G4 cpu). The latest fusion in darcs fps might be better...
>
>> module Main(main) where
>>
>> import Control.Arrow((***))
>> import Data.Char(chr,ord)
>> import Data.ByteString(ByteString)
>> import qualified Data.ByteString as B(null,foldl',span)
>> import qualified Data.ByteString.Char8 as C (readInt,pack)
>> import qualified Data.ByteString.Base as B(unsafeHead,unsafeTail)
>> import Data.Ix(inRange)
>> import Data.Maybe(fromJust)
>> import Data.Word(Word8)
>>
>> default ()
>>
>> {-# INLINE chr8 #-}
>> chr8 :: Word8 -> Char
>> chr8 = chr . fromEnum
>> {-# INLINE ord8 #-}
>> ord8 :: Char -> Word8
>> ord8 = toEnum . ord
>>
>> {-# INLINE decompose #-}
>> decompose :: a -> (Word8 -> ByteString -> a) -> ByteString -> a
>> decompose whenNull withHeadTail bs =
>> if B.null bs then whenNull else (withHeadTail $! (B.unsafeHead
>> bs)) $! (B.unsafeTail bs)
>>
>> -- This does not do any bound checking if the Int overflows
>> readInt1 :: ByteString -> Maybe (Int, ByteString)
>> readInt1 bs = decompose Nothing (\h t ->
>> case h of
>> 0x2d -> fmap (negate *** id) $ first t -- '-'
>> 0x2b -> first t -- '+'
>> _ -> first bs) bs
>> where first :: ByteString -> Maybe (Int, ByteString)
>> first = decompose Nothing (\h t ->
>> if inRange (0x30,0x39) h
>> then Just (loop t $! (fromIntegral h-0x30))
>> else Nothing)
>> loop :: ByteString -> Int -> (Int, ByteString)
>> loop bs n = decompose (n,bs) (\h t ->
>> if inRange (0x30,0x39) h
>> then loop t $! (n*10+(fromIntegral h-0x30))
>> else (n,bs)) bs
>>
>>
>> readInt2 :: ByteString -> Maybe (Int,ByteString)
>> readInt2 bs = decompose Nothing (\h t ->
>> case h of
>> 0x2d -> fmap (negate *** id) $ toInt t -- '-'
>> 0x2b -> toInt t -- '+'
>> _ -> toInt bs) bs
>> where toInt :: ByteString -> Maybe (Int,ByteString)
>> toInt bs = let (digits,rest) = B.span (inRange
>> (0x30,0x39)) bs
>> in if B.null digits then Nothing else Just
>> (convert digits,rest)
>> convert :: ByteString -> Int
>> convert = B.foldl' (\n h -> n*10 + (fromIntegral h - 0x30)) 0
>>
>> test r = let a = take 1000000 $ cycle [C.pack "13247897",C.pack
>> "-13247896"]
>> in (sum . map (fst . fromJust . r) $ a
>> , take 4 a
>> , map r $ take 4 a)
>>
>> main = print $ test $ (readInt2)
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
More information about the Haskell-Cafe
mailing list