[Haskell-cafe] Why doesn't GHC use the Hugs definition of splitAt to avoid traversing the first part of the list twice?

Sat Apr 26 09:02:30 EDT 2008

Duncan Coutts wrote:
> On Fri, 2008-04-25 at 17:30 +0100, Richard Kelsall wrote:
>> I've just been investigating a performance oddity in using splitAt
>> on a long stream of random numbers. I don't understand why GHC
>> appears to want to traverse the first part of the list twice.
>>
>> GHC seems to implement the splitAt function something like
>>
>> splitAt n xs = (take n xs, drop n xs)
>>
>> whereas Hugs is something like
>>
>> splitAt n (x : xs) = (x : xs', xs'')
>>                  where (xs', xs'') = splitAt (n-1) xs
>>
>> which seems much more sensible to me. Wouldn't it be better to change
>> GHC to the Hugs method? Have I misunderstood something?
> 
> Actually GHC uses this definition, in GHC.List:
> 
> #ifdef USE_REPORT_PRELUDE
> 
> splitAt n xs           =  (take n xs, drop n xs)
> 
> #else /* hack away */
> 
> splitAt (I# n#) ls
>   | n# <# 0#    = ([], ls)
>   | otherwise   = splitAt# n# ls
>     where
>         splitAt# :: Int# -> [a] -> ([a], [a])
>         splitAt# 0# xs     = ([], xs)
>         splitAt# _  xs@[]  = (xs, xs)
>         splitAt# m# (x:xs) = (x:xs', xs'')
>           where
>             (xs', xs'') = splitAt# (m# -# 1#) xs
> 
> #endif /* USE_REPORT_PRELUDE */
> 
> So ghc's version should be of equivalent strictness to the hugs version.
> 
> What's interesting here is that the H98 specification of splitAt is
> silly. It got 'simplified' from a previous version of the Haskell spec
> and is so doing it was made less strict.
> 
> With this definition:
> splitAt n xs           =  (take n xs, drop n xs)
> 
> splitAt _|_ _|_ = (_|_, _|_)
> 
> but with the sensible definition it'd return _|_
> 
> and that's really the only point of having splitAt, so that you can walk
> down the list once rather than twice. If someone needs the very lazy
> version there's always take and drop.
> 
> Duncan
> 

That looks good, I didn't see this 'hack away' version when I found
splitAt on the web.

I'm now wondering why my splitAtRK function in the following code
makes it run in 11 seconds given a parameter of 2500000 but it takes
14 seconds when I change it to splitAt. Am I accidentally invoking
the (take, drop) version of splitAt? Why is mine so much faster than
the built-in version? (Using GHC 6.8.2, W2K, Intel Core 2 Duo 2.33GHz)
Maybe mine isn't working properly somehow.

(I hadn't intended to post this code just yet because I wanted to
do a bit more testing etc then ask for suggestions for simplifying
and improving it. I actually want to get rid of the line containing
splitAt because it's ugly. All suggestions for improvement gratefully
received. The time function is just temporary. This code is about three
or four times slower that the current fastest Haskell entry for the
Fasta shootout benchmark. I'll elaborate it for speed when I've got
down to the simplest version possible.)

Richard.

{-# OPTIONS -O2 -fexcess-precision #-}
--
-- The Computer Language Shootout : Fasta
-- http://shootout.alioth.debian.org/
--
-- Simple solution by Richard Kelsall.
-- http://www.millstream.com/
-- 

import System

import Text.Printf
import System.CPUTime

time :: IO t -> IO t
time a = do
     start <- getCPUTime
     v <- a
     end   <- getCPUTime
     let diff = (fromIntegral (end - start)) / (10 ^12)
     printf "Calc time %0.3f \n" (diff :: Double)
     return v

main = do
     time $ comp

comp :: IO ()
comp = do
     n <- getArgs >>= readIO . head

     title "ONE" "Homo sapiens alu"
     writeLined (cycle alu) (n * 2)

     title "TWO" "IUB ambiguity codes"
     let (r1, r2) = splitAtRK (fromIntegral (n * 3)) (rand 42)
     writeLined (map (look iubs) r1) (n * 3)

     title "THREE" "Homo sapiens frequency"
     writeLined (map (look homs) r2) (n * 5)

splitAtRK n xs | n <= 0 = ([], xs)
splitAtRK _ []          = ([], [])
splitAtRK n (x : xs) = (x : xs', xs'')
                        where (xs', xs'') = splitAtRK (n - 1) xs

title :: String -> String -> IO ()
title a b = putStrLn $ ">" ++ a ++ " " ++ b

look :: [(Char, Float)] -> Float -> Char
look [(c, _)] _ = c
look ((c, f) : cfs) r = if r < f
                            then c
                            else look cfs (r - f)

lineWidth = 60

writeLined :: [Char] -> Integer -> IO ()
writeLined cs 0 = return ()
writeLined cs n = do
                 let w = min n lineWidth
                     (cs1, cs2) = splitAt (fromInteger w) cs
                 putStrLn cs1
                 writeLined cs2 (n - w)

rand :: Int -> [Float]
rand seed = newran : (rand newseed)
     where
         im = 139968
         ia = 3877
         ic = 29573
         newseed = (seed * ia + ic) `rem` im
         newran = fromIntegral newseed / fromIntegral im

alu = "GGCCGGGCGCGGTGGCTCACGCCTGTAATCCCAGCACTTTGGGAGGCCGAGGCGGGCGGA\
       \TCACCTGAGGTCAGGAGTTCGAGACCAGCCTGGCCAACATGGTGAAACCCCGTCTCTACT\
       \AAAAATACAAAAATTAGCCGGGCGTGGTGGCGCGCGCCTGTAATCCCAGCTACTCGGGAG\
       \GCTGAGGCAGGAGAATCGCTTGAACCCGGGAGGCGGAGGTTGCAGTGAGCCGAGATCGCG\
       \CCACTGCACTCCAGCCTGGGCGACAGAGCGAGACTCCGTCTCAAAAA"

iubs = [('a', 0.27), ('c', 0.12), ('g', 0.12), ('t', 0.27), ('B', 0.02),
         ('D', 0.02), ('H', 0.02), ('K', 0.02), ('M', 0.02), ('N', 0.02),
         ('R', 0.02), ('S', 0.02), ('V', 0.02), ('W', 0.02), ('Y', 0.02)]

homs = [('a', 0.3029549426680), ('c', 0.1979883004921),
         ('g', 0.1975473066391), ('t', 0.3015094502008)]