[Haskell-cafe] Network.HTTP+ByteStrings Interface--Or: How to
shepherd handles and go with the flow at the same time?
Thomas Schilling
nominolo at googlemail.com
Wed May 23 16:56:16 EDT 2007
Hello Cafe!
I'd greatly appreciate any ideas/comments on the design of the
interface to the Network.HTTP library with a LazyByteString (LBS)
backend.
As has been discussed previously on this list [1] lazy evaluation can
complicate resource management, which is especially critical if
resources are seriously limited--in this case network sockets and/or
file handles.
In [2] Oleg shows how we can use left folds as a general mechanism to
traverse collections and shows that this is superior to cursors,
generators or streams. I agree with his arguments, however, I still
don't see that this can be a good interface for the http library.
For a start, what should the collection type be? Word8, Char8,
(strict) ByteString (BS)? Unless we get some kind of fusion for the
first two we're pretty much stuck with the latter. The interface
could look like this (untested):
import Data.ByteString (ByteString) -- i.e., strict BSs
import qualified Data.ByteString as BS
handleRequest :: MonadIO m => Request -> a -> (ByteString -> a -> m
(Either a a)) -> m a
Ignoring error handling for now, this might be implemented as:
handleRequest req seed processor = do
a <- iter seed bodySize
closeConnection req
return a
where
iter seed 0 = return seed
iter seed n = do
let c = min chunkSize n
bs <- liftIO $ readBlock req c -- calls BS.hGet handle c
next <- processor bs seed
case next of
Left rslt -> return rslt
Right seed' -> iter seed' (n-c)
-- insert catchError stuff here (requires signature change, of course)
This should work fine and if the 'processor' function does not hold on
to the chunk would run in constant space. Unfortunately, this has a
big disadvantage. Most operations on the returned data will probably
be stream-like functions, such as parsing the data into some kind of
tree. [2] shows a method how to convert enumerators to streams, but
the used stream type
data MyStream m a = MyNil (Maybe a) | MyCons a (m (MyStream m a))
is incompatible with [a] which is used by lazy ByteStrings due to the
embedding of m. I also don't know if fusion can work on monads. My
current suggestion would therefore be a less "save"[*] solution
(again, untested and modulo error handling).
-- | Execute the request and call @f@ with the returned response body.
-- The socket will be closed immediately after @f@ terminates. You
must therefore
-- make sure that any data you might want to returned has to be
forced, e.g. using
-- (length . take) lbs
withRequest :: MonadIO => Request -> a -> (LazyByteString -> a -> m a) -> m a
The implementation would lazily read the contents (implemented as
described in [3]) and forcing it would be left to the function
parameter. E.g.
getHTML :: String -> IO HTMLParseTree
getHTML addr = do
r <- mkRequest addr
tree <- withRequest r emptyTree parseHTML
seq tree $ return tree -- ! (I'm afraid this is necessary)
tricky :: LazyByteString -> String -> IO LazyByteString -- result
will not really be lazy
trickyt str addr = do
r <- mkRequest addr
withRequest r L.empty dropNTake
-- we might have to force the result here
where
dropNTake s _ = L.take 100000 . L.drop 100000
If these 'seq's are really necessary, then this would be a pretty hard
to use interface.
So, any ideas / suggestions ?
/ Thomas
[*] .. "save" in the sense that it does not enforce certain behavior
by means of the type signature or API design.
[1] .. http://thread.gmane.org/gmane.comp.lang.haskell.cafe/20528/focus=20635
[2] .. http://okmij.org/ftp/papers/LL3-collections-enumerators.txt
[3] .. http://nominolo.blogspot.com/2007/05/networkhttp-bytestrings.html
--
"Remember! Everytime you say 'Web 2.0' God kills a startup!" -
userfriendly.org, Jul 31, 2006
More information about the Haskell-Cafe
mailing list