[Haskell-beginners] lazy IO in readFile

Andrew Sackville-West andrew at swclan.homelinux.org
Fri May 7 23:59:02 EDT 2010


On Sat, May 08, 2010 at 03:41:43PM +1200, Stephen Blackheath [to Haskell-Beginners] wrote:
> Andrew,
> 
> In Haskell, lazy I/O is a form of cheating, because Haskell functions
> are supposed to have no side effects, and lazy I/O is a side effect.  At
> first, cheating seems attractive, but it takes a bit of experience to
> really understand why cheating really is not a good idea, and that
> Haskell is so powerful that it doesn't matter that you shouldn't
> cheat.

So, are you saying that using something like readFile is cheating? Or
just that lazy IO itself is cheating?

>  That has certainly been my experience, and I had to find out the hard
> way.  It sounds like you're starting to see some of the problems with
> cheating.

Indeed. This whole exercise (of which the below is just a piece) has
been enlightening. I'm reminded of the cat who's stuck in the IO
monad. I've certainly gotten better at moving into and out of IO (or
moving functions around into and out of IO). 

> 
> Here's someone's philosophizing on the subject:
> 
> http://lukepalmer.wordpress.com/2009/06/04/it-is-never-safe-to-cheat/

cool thanks

> 
> So the short answer is, no - there is no way to force the file returned
> by readFile to close.

I figured as much. I'm not completely unhappy with my solution since
it irks me to write out an empty list anyway. And it's really a simple
little project for my personal use...

> 
> I'd recommend using withFile and hGetLine, like this:
> 
> withFile "testfile" ReadMode $ \h -> do
>     ...
>     l <- hGetLine h

and using this to read through the entire file and then closing it?
(Don't answer that, I'll do the reading). hmm... a little thought
suggests that laziness will still get me unless I put some strictness
in somewhere. I'm still left with a case where the history list is
never completely evaluated forcing the reading of EOF. I will apply
some thought to it and see what happens.

thanks

A


> 
> If you want more speed, take a look at the stuff in Data.ByteString.  If
> you want proper text encoding and speed, take a look at the 'text'
> package.
> 
> 
> Steve
> 
> On 08/05/10 14:47, Andrew Sackville-West wrote:
> > I'm trying to suss out the best (heh, maybe most idiomatic?) way to
> > handle laziness in a particular file operation. I'm reading a file
> > which contains a list of rss feed items that have been seen
> > previously. I use this list to eliminate feed items I've seen before
> > and thus filter a list of new items. (it's a program to email me feed
> > items from a couple of low frequency feeds).
> > 
> > So, the way I do this is to open the history file with readFile, read
> > it into a list and then use that as a parameter to a filter
> > function. Instead of getting confusing, here is some simple code that
> > gets at the nut of the problem:
> > 
> > import Control.Monad
> > 
> > isNewItem :: [String] -> String -> Bool
> > isNewItem [] = \_ -> True
> > isNewItem ts = \x -> not (any (== x) ts)
> > 
> > filterItems :: [String] -> [String] -> [String]
> > filterItems old is = filter (isNewItem old) is
> > 
> > getOldData :: IO [String]
> > getOldData = catch (liftM lines $ readFile "testfile") (\_ -> return [])
> > 
> > main = do
> >   let testData = ["a", "b", "c", "d"] :: [String]
> >   currItems <- getOldData 
> >   let newItems = filterItems currItems $ testData
> > 
> >   print newItems -- this is important, it mimics another IO action I'm
> >   		 --  doing in the real code...
> > 
> >   appendFile "testfile" . unlines $ newItems
> > 
> > 
> > 
> > Please ignore, for the moment, whatever *other* problems (idiomatic or
> > just idiotic) that may exist above and focus on the IO problem. 
> > 
> > This code works fine *if* the file "testfile" has in it some subset of the
> > testData list. If it has the complete set, it fails with a "resource
> > busy" exception. 
> > 
> > Okay, I can more or less understand what's going on here. Each letter
> > in the testData list gets compared to the contents of the file, but
> > because they are *all* found, the readFile call never has to try and
> > fail to read the last line of the file. Thus the file handle is kept
> > open lazily waiting around not having reached EOF.  Fair enough. 
> > 
> > But what is the best solution? One obvious one, and the one I'm using
> > now, is to move the appendFile call into a function with guards to
> > prevent trying to append an empty list to the end of the file. This
> > solves the problem not by forcing the read on EOF, but by not
> > bothering to open the file for appending:
> > 
> > writeHistory [] = return ()
> > writeHistory ni = appendFile "testfile" . unlines $ ni
> > 
> > And this makes some sense. It's silly to try to write nothing to a
> > file.
> > 
> > But it also rubs me the wrong way. It's not solving the problem
> > directly -- closing that file handle. So there's my question, how can
> > I close that thing? Is there some way to force it? Do I need to rework
> > the reading to read one line ahead of whatever I'm testing against
> > (thereby forcing the read of EOF and closing the file)? 
> > 
> > thanks 
> > 
> > A
> > 
> > 
> > 
> > 
> > _______________________________________________
> > Beginners mailing list
> > Beginners at haskell.org
> > http://www.haskell.org/mailman/listinfo/beginners
> _______________________________________________
> Beginners mailing list
> Beginners at haskell.org
> http://www.haskell.org/mailman/listinfo/beginners
> 

-- 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 198 bytes
Desc: Digital signature
Url : http://www.haskell.org/pipermail/beginners/attachments/20100507/125e8c24/attachment.bin


More information about the Beginners mailing list