[Haskell-cafe] Why so slow?
Donald Bruce Stewart
dons at cse.unsw.edu.au
Tue Dec 12 23:16:25 EST 2006
lists:
> The code below is using way more RAM than it should. It seems to only
> take so long when I build the 'programs' list - the actual
> reading/parsing is fast. For a 5MB input file, it's using 50MB of RAM!
> Any idea how to combat this?
>
> Thanks,
> Lyle
>
> {-# OPTIONS_GHC -fglasgow-exts #-}
>
> -- linear_importer.hs
>
> import Control.Monad (unless)
> import Data.ByteString.Char8 (ByteString)
> import qualified Data.ByteString.Char8 as BS
> import Text.Regex as RE
> import Data.Time.Calendar
> import Data.Time.LocalTime
> import System.Environment
>
> -- Consider adding strictness as necessary for performance.
>
> -- STB time channel
> data ChannelChange = ChannelChange Int LocalTime Int
> deriving Show
>
> -- channel program_no start
> end network_no
> data ScheduleProgram = ScheduleProgram Int Int LocalTime
> LocalTime (Maybe Int)
> deriving Show
>
> main = do
> fileNames <- getArgs
> ensure (length fileNames == 2) usageMessage
> let [eventFileName,programFileName] = fileNames
> putStrLn ("Reading program schedule file from '" ++ programFileName
> ++ "'...")
> text <- BS.readFile programFileName
> programs <- sequence $ map parseScheduleProgram (BS.lines text)
> print (take 20 programs)
> return ()
>
> usageMessage = "Usage: linear_importer <channel change file> <schedule
> file>"
>
> parseScheduleProgram :: ByteString -> IO ScheduleProgram
> parseScheduleProgram s = do
> let fields = BS.split '|' s
> ensure (length fields == 7) ("Wrong number of fields in schedule
> program: " ++ BS.unpack s)
> let
> [_,channelNoText,programNoText,_,startTimeText,endTimeText,networkNoText]
> = fields
> let channelNo = read $ BS.unpack channelNoText
> programNo = read $ BS.unpack programNoText
> startTime <- parseProgramTime startTimeText
> endTime <- parseProgramTime endTimeText
> let networkNo = if BS.null networkNoText then Nothing else Just
> (read (BS.unpack networkNoText))
> return $ ScheduleProgram channelNo programNo startTime endTime networkNo
>
> parseProgramTime :: ByteString -> IO LocalTime
> parseProgramTime s = do
> let parts = BS.split 'T' s
> ensure (length parts == 2)
> ("Expected exactly one T in eventChannelChange time: " ++
> BS.unpack s)
> let [datePart,timePart] = parts
> ensure (BS.length datePart == 8)
> ("Expected 8 digits in date part of eventChannelChange time: "
> ++ BS.unpack s)
> let (yearPart, monthDayPart) = BS.splitAt 4 datePart
> (monthPart, dayPart) = BS.splitAt 2 monthDayPart
> year = read $ BS.unpack yearPart
> month = read $ BS.unpack monthPart
> day = read $ BS.unpack dayPart
> let date = fromGregorian year month day
> ensure (BS.length timePart == 6)
> ("Expected 6 digits in time part of eventChannelChange time: "
> ++ BS.unpack s)
> let (hoursPart,minutesSecondsPart) = BS.splitAt 2 timePart
> (minutesPart,secondsPart) = BS.splitAt 2 minutesSecondsPart
> hours = read $ BS.unpack hoursPart
> minutes = read $ BS.unpack minutesPart
> seconds = read $ BS.unpack secondsPart
> let time = TimeOfDay hours minutes (fromInteger seconds)
> return (LocalTime date time)
Argh, all those: read .unpacks are going to be painful.
Consider using Data.ByteString.Char8.readInt/Integer
-- Don
More information about the Haskell-Cafe
mailing list