[Haskell-cafe] Is it safe to use unsafePerformIO here?

Svein Ove Aas svein.ove at aas.no
Tue Sep 15 15:16:22 EDT 2009


I have a number of suggestions, some of which conflict with each
other, so I'll just throw them out here. Let's see..

First off, the IO monad does indeed enforce sequencing; that's its
primary purpose. However, you can ask for it to run I/O out of order,
specifically when the value the out-of-order action returns is
actually forced (used); that's lazy I/O, and is implemented using
unsafeInterleaveIO.

You would not usually use unsafeInterleaveIO directly, though.
Instead, you'd use an existing wrapper, such as hGetContents. (for
Strings, or lazy bytestrings; the strict bytestring variant reasonably
has a strict semantics)

One thing to keep in mind about lazy I/O is that the I/O in question
can run at any arbitrary time, or not at all; not more than once,
though. You must make sure this is safe. For file input, that
basically means the file should not change during the program's
lifetime.

hGetLine is not lazy in this way, but the hGetContents you use is. I'm
not sure whether this means your program should work as-is, and I'm
not going to examine it closely enough to tell - as you mentioned it's
a mockup anyway. Besides..

Strings are also *slow*. What you want for I/O is, when reasonably
possible, bytestrings. You'd then use parsec-bytestring, or if
possible Data.Binary, to parse said bytestring; the latter is faster
(..probably), if more limited in function.

You could use the lazy bytestring hGetContents for this. However...

There is also a bytestring-mmap package on hackage, which outsources
the decision of what blocks to load into memory to the OS, and has the
best performance overall. Use this.


Oh. And unsafePerformIO is a trap that will kill you. See
http://www.girlgeniusonline.com/comic.php?date=20070725 for details.

On Tue, Sep 15, 2009 at 8:36 PM, Cristiano Paris
<cristiano.paris at gmail.com> wrote:
> Hi Cafè,
>
> I've the following problem: I have a (possibly very long) list of
> files on disk. Each file contains some metadata at the beginning and
> continues with a (possibly very large) chunk of data.
>
> Now, the program I'm writing can be run in two modes: either read a
> specific file from the disk and show the whole chunk of data on
> screen, or read all the files' metadata, sort the file list based on
> the metadata, and display a summary of those without reading the chunk
> of data from each file. I've factored out the file access machinery in
> a single module so as to use it indifferently under the two scenarios.
>
> At first, I wrote a piece of code which, in spirit, works like the
> following reduced case:
>
> ------
> module Main where
>
> import System.IO
> import Control.Applicative
> import Data.List
> import Data.Ord
>
> import Debug.Trace
>
> data Bit = Bit { index :: Integer, body :: String }
>
> readBit fn = withFile fn ReadMode $ \h -> Bit <$> (hGetLine h >>=
> return . read) <*> readBody
>             where readBody = withFile fn ReadMode $ \h -> do b <-
> hGetContents h
>                                                              seq b $
> trace ("Read body from: " ++ fn) $ return b
>
> main = do bl <- mapM readBit ["file1.txt","file2.txt"]
>          mapM_ (putStrLn . show . index) $ sortBy (comparing index) bl
> ----
>
> which is very expressive as it's written in applicative style.
>
> Each file is like the following:
>
> ---- file1.txt ----
> 1
> foo
> ----
>
> I've created a separate IO action for reading the body in the hope
> that it wouldn't get executed when the file list is sorted. But, to my
> surprise, it didn't work as the trace message gets written for each
> file before the summary is displayed.
>
> Thinking about this, I came to the conclusion that the IO Monad is
> enforcing proper IO ordering so that the IO action for file1's body
> must be executed right before IO action for file2's index one.
>
> If this is true, the only solution that came into my mind was to wrap
> the IO action for reading the body in an unsafePerformIO call. I
> actually ran the program with this modification and it works properly.
>
> So, as using unsafePerformIO is a Great Evil (TM), I'm wondering if
> there's a different way to do this which doesn't rely on retyping body
> as an IO action returning a String, which would break my pure code
> manipulating the files.
>
> My opinion is that using unsafePerformIO here is like ensuring the
> compiler that there're no observable side effects in running the IO
> action for reading the body and that no other side effects would
> impact this IO action.
>
> Thank you for any thoughts.
>
> Cristiano
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>



-- 
Svein Ove Aas


More information about the Haskell-Cafe mailing list