[Haskell-cafe] memory issues
Daniel Fischer
daniel.is.fischer at web.de
Fri Feb 27 18:21:31 EST 2009
Am Freitag, 27. Februar 2009 23:18 schrieb Rogan Creswick:
>
> \begin{code}
> -- Compiled with:
> -- $ ghc --make offsetSorter.hs
> --
> -- (ghc v. 6.8.2)
> --
> -- Run with:
> -- $ time ./offsetSorter data/byteOffsets.txt > haskOffsets.txt
> -- offsetSorter: out of memory (requested 1048576 bytes)
> --
> -- real 4m12.130s
> -- user 3m4.812s
> -- sys 0m5.660s
> --(OOM happened after consuming just over 3000mb of Virt, 2.6gb Res,
> according to top.)
> --
>
> import System (getArgs)
> import Data.Maybe
> import Monad
> import Text.Printf (printf)
> import Data.Function (on)
> import Data.List (sort)
> import Data.ByteString (ByteString)
> import qualified Data.ByteString.Char8 as C8
> import qualified Data.ByteString as B
>
>
> -- get the lines
> -- parse each line to get the offset.
> -- scan the list of offsets
>
> -- | The full file size:
> maxSize :: Integer
> maxSize = 2807444044080
>
> -- | Block is a contiguous chunk of data.
> -- The first entry is the offset, the second is the length.
> data Block = Block {
> offset::Integer
> , size::Integer
> } deriving (Eq)
>
> -- | Ordering of Blocks is based entirely on the block size.
> instance Ord Block where
> compare = compare `on` size
>
> instance Show Block where
> show (Block o s) = (show o) ++ " " ++ (show s)
>
> -- turn the file into a list of offsets:
> getOffsets :: ByteString -> [Integer]
> getOffsets = catMaybes . map parseOffset . C8.lines
>
> -- | Pull out the offsets frome a line of the file.
> parseOffset :: ByteString -> Maybe Integer
> parseOffset s = do
> (i, _) <- C8.readInteger (C8.filter (/=':') s)
Why the C8.filter (/= ':')?
That just costs and doesn't help anything (in fact, if your file contains
lines like 1234:5678, it gives wrong results).
If you know that your file contains only lines of the form offset: <page>,
you can have
getOffsets = map (fst . fromJust . C8.readInteger) . C8.lines
which seems to do a little good.
> Just i
>
> -- | Get the offsets between entries in a list
> getSizes :: [Integer] -> [Integer]
> getSizes (x:y:[]) = [y - x]
> getSizes (x:y:ys) = (y - x):(getSizes (y:ys))
>
> -- | creates and returns a list of Blocks, given a file's content.
> blocks :: ByteString -> [Block]
> blocks s = zipWith (Block) offsets sizes
> where offsets = getOffsets s
> sizes = getSizes (offsets ++ [maxSize])
>
> main :: IO ()
> main = do
> args <- getArgs
> content <- B.readFile (args!!0)
> printf "%s" $ unlines $ map (show) (sort $! blocks content)
Bad!
Use
mapM_ print $ sort $ blocks content
> \end{code}
More information about the Haskell-Cafe
mailing list