[Haskell-beginners] too lazy parsing?

Patrick Mylund Nielsen haskell at patrickmylund.com
Mon Feb 4 14:03:48 CET 2013


conduit and pipes are two examples:

http://hackage.haskell.org/package/conduit
http://hackage.haskell.org/package/pipes


On Mon, Feb 4, 2013 at 2:00 PM, Kyle Murphy <orclev at gmail.com> wrote:

> I can't say 100% for sure, but I'd guess it's because parsec is pure, and
> the file operations are using lazy bytestrings. Since no IO operations are
> applied to cont until after you close the handle, nothing can be read
> (since at that time the handle is closed). If you want to keep the program
> structured the same I believe there are functions that can convert a lazy
> bytestring into a strict one, and then you can perform the parsing on that.
> Alternatively you could rewrite things to close the file handle after you
> write it's contents to the output file.
>
> The default file operations in Haskell are known to be a source of
> difficulty in terms of laziness, and there has been some debate as to
> whether they're poorly designed or not. I might suggest you look into some
> of the alternatives, particular those based on stream fusion principles,
> that allow you to kill two birds with one stone by iteratively dealing with
> input thereby forcing evaluation and also improving memory usage and making
> it harder to trigger space leaks. I don't have the names available at the
> moment or I'd provide them, but I'm pretty sure at least one of them is
> named something like enumeratee, although I believe there's at least one
> other that might debatably be considered better.
> On Feb 4, 2013 5:51 AM, "Kees Bleijenberg" <k.bleijenberg at lijbrandt.nl>
> wrote:
>
>> module Main where ****
>>
>> ** **
>>
>> import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse)**
>> **
>>
>> import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)****
>>
>>
>> ****
>>
>> parseFile hOut fn = do****
>>
>>                         handle <- openFile fn ReadMode****
>>
>>                         cont <- hGetContents
>> handle                                       ****
>>
>>                         print cont****
>>
>>                         let res = parse (many (string "blah")) "" cont***
>> *
>>
>>                         hClose handle                    ****
>>
>>                         case res of****
>>
>>                             (Left err) -> hPutStrLn hOut $ "Error: " ++
>> (show err)****
>>
>>                             (Right goodRes) -> mapM_ (hPutStrLn hOut)
>> goodRes                         ****
>>
>>                  ****
>>
>> main = do   ****
>>
>>             hOut <- openFile "outp.txt" WriteMode****
>>
>>             mapM (parseFile hOut) ["inp.txt"]****
>>
>>             hClose hOut****
>>
>> ** **
>>
>> I’am writing a program that parses a lot of files. Above is the simplest
>> program I can think of that demonstrates my problem.****
>>
>> The program above parses inp.txt.  Inp.txt has only the word blah in it.
>> The output is saved in outp.txt. This file contains the word blah after
>> running the program. if I comment out the line ‘print cont’ nothing is
>> saved in outp.txt.  ****
>>
>> If I comment out ‘print cont’ and replace many with many1 in the
>> following line, it works again?****
>>
>> Can someone explain to me what is going  on?****
>>
>> ** **
>>
>> Kees****
>>
>> _______________________________________________
>> Beginners mailing list
>> Beginners at haskell.org
>> http://www.haskell.org/mailman/listinfo/beginners
>>
>>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130204/20b95a82/attachment.htm>


More information about the Beginners mailing list