[Haskell-cafe] Parsec and network data
Thomas Schilling
nominolo at googlemail.com
Sat Aug 30 06:51:38 EDT 2008
There's a whole bunch of other problems with lazy network IO. The big
problem is that you cannot detect when your stream ends since that
will happen inside unsafeInterleaveIO which is invisible from inside
pure code. You also have no guarantee that the lazy code actually
consumes code enough. Finalisers don't help, either, since there is
in fact no guarantee they are actually run, never mind on time.
The proposed solution by Oleg & co. is to use enumerations/left folds
[1]. The basic idea is to use a callback which gets handed a chunk of
the input from the network. When the last chunk is handed-out the
connection is closed automatically. Using continuations, you can turn
this into a stream again [2] which is needed for many input processing
tasks, like parsing.
I remember Johan Tibell (CC'd) working on an extended variant of
Parsec that can deal with this chunked processing. The idea is to
teach Parsec about a partial input and have it return a function to
process the rest (a continuation) if it encounters the end of a chunk
(but not the end of a file). Maybe Johan can tell you more about
this, or point you to his implementation.
[1]: http://okmij.org/ftp/papers/LL3-collections-enumerators.txt
[2]: http://okmij.org/ftp/Haskell/fold-stream.lhs
/ Thomas
On Tue, Aug 26, 2008 at 10:35 PM, brian <brianchina60221 at gmail.com> wrote:
> Hi, I've been struggling with this problem for days and I'm dying. Please help.
>
> I want to use Parsec to parse NNTP data coming to me from a handle I
> get from connectTo.
>
> One unworkable approach I tried is to get a lazy String from the
> handle with hGetContents. The problem: suppose the first message from
> the NNTP server is "200 OK\r\n". Parsec parses it beautifully. Now I
> need to discard the parsed part so that Parsec will parse whatever the
> server sends next, so I use Parsec's getInput to get the remaining
> data. But there isn't any, so it blocks. Deadlock: the client is
> inappropriately waiting for server data and the server is waiting for
> my first command.
>
> Another approach that doesn't quite work is to create an instance of
> Parsec's Stream with timeout functionality:
>
> instance Stream Handle IO Char where
> uncons h =
> do r <- hWaitForInput h ms
> if r
> then liftM (\c -> Just (c, h)) (hGetChar h)
> else return Nothing
> where ms = 5000
>
> It's probably obvious to you why it doesn't work, but it wasn't to me
> at first. The problem: suppose you tell parsec you're looking for
> (many digit) followed by (string "\r\n"). "123\r\n" won't match;
> "123\n" will. My Stream has no backtracking. Even if you don't need
> 'try', it won't work for even basic stuff.
>
> Here's another way:
> http://www.mail-archive.com/haskell-cafe@haskell.org/msg22385.html
> The OP had the same problem I did, so he made a variant of
> hGetContents with timeout support. The problem: he used something from
> unsafe*. I came to Haskell for rigor and reliability and it would make
> me really sad to have to use a function with 'unsafe' in its name that
> has a lot of wacky caveats about inlining, etc.
>
> In that same thread, Bulat says a timeout-enabled Stream could help.
> But I can't tell what library that is. 'cabal list stream' shows me 3
> libraries none of which seems to be the one in question. Is Streams a
> going concern? Should I be checking that out?
>
> I'm not doing anything with hGetLine because 1) there's no way to
> specify a maximum number of characters to read 2) what is meant by a
> "line" is not specified 3) there is no way to tell if it read a line
> or just got to the end of the data. Even using something like hGetLine
> that worked better would make the parsing more obscure.
>
> Thank you very very much for *any* help.
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>
More information about the Haskell-Cafe
mailing list