[Haskell-cafe] Takusen and strictness

Fri Mar 2 08:50:56 EST 2007

> From: haskell-cafe-bounces at haskell.org 
> [mailto:haskell-cafe-bounces at haskell.org] On Behalf Of Paul Moore
> 
> But, will this read the database lazily, or will it get all the rows
> into memory at once? How will using result' instead of result (in
> runSql) affect this? And as I said above, how can I learn to work this
> out for myself?

> runSql :: String -> IO [String]
> runSql s = withSession (connect "USER" "PASSWD" "DB") ( do
>        let iter (s::String) accum = result (s : accum)
>        r <- doQuery (sql s) iter []
>        return r
>     )

The iteratee function:

>        let iter (s::String) accum = result (s : accum)

is fed one row at a time. Takusen can fetch rows from the DBMS one at a
time (rather slow across a network) or fetch them in chunks (i.e.
cached) and feed them to the iteratee one at a time, which is obviously
much better from a network performance perspective. Currently, the
default is to fetch rows in chunks of 100. So getting rows from the
database is more-or-less "on-demand", but it's transparent from your
point-of-view, because your iteratee only gets them one a a time.

What you're interested in, I think, is what the iteratee does with the
data. In your case, it conses each username onto the front of a list,
which is initially empty. Because you're using result (not result') this
consing is lazy, but doQuery still builds a huge tree of unevaluated
thunks, because it's operating in the IO monad (well, the DBM monad, but
you get the idea). That big tree of unevaluated thunks may give you
trouble if the number of rows in the result-set is large. This is why we
recommend the result' function: it uses $! to force the cons to be
strict. It may not matter so much for lists and other data structures,
but for arithmetic you certainly do not want to build up a huge tree of
thunks, which is what happens if you use result (non-strict) and "+"
(say).

If you don't need the entire list at once, then push your processing
into the iteratee. You are not obliged to return all of the data from
doQuery in one big data structure, like a list. You can do IO in the
iteratee, and even just return (). If you want to terminate the fetch
early, you can return (Left <something>) in the iteratee, rather than
(Right <something>). result and result' always return (Right
<something>), so they process the result-set all the way through.

I'm sure someone asked Oleg and I about lazy result-set processing in
Takusen (and why it's not done) a few months ago (a private email, I
think), but right now I'm unable to find our response.

Alistair
*****************************************************************
Confidentiality Note: The information contained in this message,
and any attachments, may contain confidential and/or privileged
material. It is intended solely for the person(s) or entity to
which it is addressed. Any review, retransmission, dissemination,
or taking of any action in reliance upon this information by
persons or entities other than the intended recipient(s) is
prohibited. If you received this in error, please contact the
sender and delete the material from any computer.
*****************************************************************