[Haskell-cafe] Re: iteratee enumHandle (and parsing)

John Lato jwlato at gmail.com
Tue Jul 14 21:13:20 EDT 2009


Hello,

This explanation is entirely correct.  I'm not happy with this
situation, because I think it's confusing for people just starting
with iteratee.  However, I do want something that will be unicode (and
other possible encodings) aware.

Currently, I'm waiting to see how unicode support in the base
libraries works out.  What seems the best possibility at this time is
writing codecs that could be used with mapStream or convStream to
handle this appropriately, but I'd prefer to re-use core unicode
support if possible.  I'm open to suggestions on this issue.  At the
very least, I should probably include the ASCII encoding in the
iteratee package.

I'd like to make the common case of reading ASCII text easier than it
currently is.  Until then, one approach is using mapStream as
suggested below.  Another is using a wrapped ByteString instead of the
List chunk.  The ByteString chunk forces the element to be an 8-bit
ASCII character instead of using the Char storable instance.  If the
full polymorphism of the List chunk is required, the conversion can be
done with either mapStream or convStream.  convStream is probably more
efficient for this at the moment.

Regarding using Parsec with iteratees: unfortunately I don't really do
much in the way of parsing myself so I'm not much help here.  I think
that Oleg intended for parsers to be written directly using the
iteratee primitives, in particular the "heads" function would be
useful (at least he found it so).  I don't see an efficient way to
generate a continuation on a partial parse with Parsec, using
iteratees or not.  I think you could just load data into a parser with
setInput, but if you don't have enough data for a complete parse
you'll just need to hold onto all the data until you've loaded enough
to run it.

If you tell me what functions you find lacking from iteratee, I'll
look into implementing them.  I'd rather not re-implement all of
Parsec, but the text parsers should be simple enough, as well as some
of the combinators.

Cheers,
John Lato

> From: Echo Nolan <hellish at comcast.net>
>
> Hi Paolino.
>
> What's happening is reading [Char] uses the Storable instance for Char
> which is 32-bit. Thus, you get gibberish. The below does what you want,
> by reading Word8s and converting them.
>
> import Control.Exception
> import Data.Char
> import Data.Iteratee.IO
> import Data.Iteratee.Base
> import Data.Word
> import System.IO
>
> main :: IO ()
> main = do
>        h <- openFile "mamma23" ReadWriteMode
>        hPutStr h "ciao"
>        hSeek h AbsoluteSeek 0
>        l <- enumHandle h readString >>= run
>        print $ assert (l == "ciao") ()
>
> -- This is declared on its own so I can give a type signature without making
> -- any of the above lines unmanageably long.
> readString :: IterateeG [] Word8 IO String
> readString = joinI $ mapStream (chr . fromIntegral) stream2list
>
> This only works for ASCII, of course. Someone should write some
> enumerators for the other encodings.
>
> Regards,
> Echo Nolan


More information about the Haskell-Cafe mailing list