[Haskell-beginners] too lazy parsing?

Kyle Murphy orclev at gmail.com
Mon Feb 4 16:38:02 CET 2013


So this thread got me thinking about what's wrong with file IO in Haskell.
My conclusion is that file handles are problematic because they're
essentially pointers and as such subject to the same sorts of problems such
as use after freeing (as in this example) which is further complicated by
defaulting to a lazy implementation. The same sorts of solutions to dealing
with the Ptr type can be applied to dealing with file handles (and in fact
have been) such as the way alloca works by wrapping the operations on a
pointer such that it's allocated, passed into a function, and then freed on
exit from the function. In a similar fashion you could have a withFile
function that takes a file name, a R/W mode, and a function to perform some
work on the file. This is in fact the exact pattern implemented by the
ResourceT library (which in turn started as part of the previously
mentioned conduit package).

In general I'd recommend avoiding all the standard file functions as
they're very un-Haskellish and inherently unsafe (in the type safety sense,
not the security sense). I personally think those functions should be
phased out in future Haskell releases in favor of better functional
abstractions, but my opinion carries very little weight being essentially
nobody in the Haskell world, so take with whatever sized grain of salt you
feel is appropriate.
On Feb 4, 2013 8:05 AM, "Patrick Mylund Nielsen" <haskell at patrickmylund.com>
wrote:

> conduit and pipes are two examples:
>
> http://hackage.haskell.org/package/conduit
> http://hackage.haskell.org/package/pipes
>
>
> On Mon, Feb 4, 2013 at 2:00 PM, Kyle Murphy <orclev at gmail.com> wrote:
>
>> I can't say 100% for sure, but I'd guess it's because parsec is pure, and
>> the file operations are using lazy bytestrings. Since no IO operations are
>> applied to cont until after you close the handle, nothing can be read
>> (since at that time the handle is closed). If you want to keep the program
>> structured the same I believe there are functions that can convert a lazy
>> bytestring into a strict one, and then you can perform the parsing on that.
>> Alternatively you could rewrite things to close the file handle after you
>> write it's contents to the output file.
>>
>> The default file operations in Haskell are known to be a source of
>> difficulty in terms of laziness, and there has been some debate as to
>> whether they're poorly designed or not. I might suggest you look into some
>> of the alternatives, particular those based on stream fusion principles,
>> that allow you to kill two birds with one stone by iteratively dealing with
>> input thereby forcing evaluation and also improving memory usage and making
>> it harder to trigger space leaks. I don't have the names available at the
>> moment or I'd provide them, but I'm pretty sure at least one of them is
>> named something like enumeratee, although I believe there's at least one
>> other that might debatably be considered better.
>> On Feb 4, 2013 5:51 AM, "Kees Bleijenberg" <k.bleijenberg at lijbrandt.nl>
>> wrote:
>>
>>> module Main where ****
>>>
>>> ** **
>>>
>>> import Text.ParserCombinators.Parsec (many,many1,string, Parser, parse)*
>>> ***
>>>
>>> import System.IO (IOMode(..),hClose,openFile,hGetContents,hPutStrLn)****
>>>
>>>
>>> ****
>>>
>>> parseFile hOut fn = do****
>>>
>>>                         handle <- openFile fn ReadMode****
>>>
>>>                         cont <- hGetContents
>>> handle                                       ****
>>>
>>>                         print cont****
>>>
>>>                         let res = parse (many (string "blah")) "" cont**
>>> **
>>>
>>>                         hClose handle                    ****
>>>
>>>                         case res of****
>>>
>>>                             (Left err) -> hPutStrLn hOut $ "Error: " ++
>>> (show err)****
>>>
>>>                             (Right goodRes) -> mapM_ (hPutStrLn hOut)
>>> goodRes                         ****
>>>
>>>                  ****
>>>
>>> main = do   ****
>>>
>>>             hOut <- openFile "outp.txt" WriteMode****
>>>
>>>             mapM (parseFile hOut) ["inp.txt"]****
>>>
>>>             hClose hOut****
>>>
>>> ** **
>>>
>>> I’am writing a program that parses a lot of files. Above is the simplest
>>> program I can think of that demonstrates my problem.****
>>>
>>> The program above parses inp.txt.  Inp.txt has only the word blah in
>>> it.  The output is saved in outp.txt. This file contains the word blah
>>> after running the program. if I comment out the line ‘print cont’ nothing
>>> is saved in outp.txt.  ****
>>>
>>> If I comment out ‘print cont’ and replace many with many1 in the
>>> following line, it works again?****
>>>
>>> Can someone explain to me what is going  on?****
>>>
>>> ** **
>>>
>>> Kees****
>>>
>>> _______________________________________________
>>> Beginners mailing list
>>> Beginners at haskell.org
>>> http://www.haskell.org/mailman/listinfo/beginners
>>>
>>>
>> _______________________________________________
>> Beginners mailing list
>> Beginners at haskell.org
>> http://www.haskell.org/mailman/listinfo/beginners
>>
>>
>
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.haskell.org/pipermail/beginners/attachments/20130204/240b743d/attachment-0001.htm>


More information about the Beginners mailing list