Data.PackedString.hGetPS
Ian Lynagh
igloo at earth.li
Sun Feb 29 04:13:54 EST 2004
Hi all,
If I have
\begin{code}
import Data.PackedString (hGetPS)
import System.IO (openBinaryFile, hClose, IOMode(ReadMode))
main :: IO ()
main = do h <- openBinaryFile "1000000x" ReadMode
p <- hGetPS h 1000000
hClose h
return ()
\end{code}
(1000000x is a file containing 1000000 'x' characters) then:
and compile with ghc -O2 (either 6.2 or reasonably recent CVS) then I
get:
$ ./foo
Stack space overflow: current size 1048576 bytes.
Use `+RTS -Ksize' to increase it.
$
The definition is:
-- | Read a 'PackedString' directly from the specified 'Handle'.
-- This is far more efficient than reading the characters into a 'String'
-- and then using 'packString'.
-- NOTE: as with 'hPutPS', the string representation in the file is
-- assumed to be ISO-8859-1.
hGetPS :: Handle -> Int -> IO PackedString
hGetPS h i = do
arr <- newArray_ (0, i-1)
l <- hGetArray h arr i
chars <- mapM (\i -> readArray arr i >>= return.chr.fromIntegral) [0..l-1]
return (packString chars)
(is the efficiency comment accurate? This definition is calling
packString on a String it makes, so it's not intuitively obvious to me).
I think the last line should be replaced with
return (packNChars l chars)
and that the packNChars definition:
packNChars len str = PS (array (0,len-1) (zip [0..] str))
is equivalent to:
packNChars len str = PS (listArray (0,len-1) str)
which I assume is more efficient (due to not con/destructing tuples if
nothing else).
None of this solves the stack overflow, though.
While I'm here, David Roundy has a PackedString which looks to have been
forked from GHC or its libraries, now based around
data PackedString = PS !(ForeignPtr Word8) !Int !Int
He said it can't serve as a replacement as it doesn't support large
unicode characters, though (would just changing to a Word32 or Char fix
that?). If you're interested it's in the darcs
(http://abridgegame.org/darcs/) darcs repo - selecting
"FastPackedString.hs" on http://abridgegame.org/cgi-bin/darcs?darcs*
should show you the latest version. It doesn't have this problem.
Thanks
Ian
More information about the Libraries
mailing list