[Haskell-cafe] Takusen and strictness

Fri Mar 2 09:04:27 EST 2007

On 02/03/07, Bayley, Alistair <Alistair_Bayley at invescoperpetual.co.uk> wrote:
[...]
> What you're interested in, I think, is what the iteratee does with the
> data.

That's correct.

> In your case, it conses each username onto the front of a list,
> which is initially empty. Because you're using result (not result') this
> consing is lazy, but doQuery still builds a huge tree of unevaluated
> thunks, because it's operating in the IO monad (well, the DBM monad, but
> you get the idea). That big tree of unevaluated thunks may give you
> trouble if the number of rows in the result-set is large. This is why we
> recommend the result' function: it uses $! to force the cons to be
> strict. It may not matter so much for lists and other data structures,
> but for arithmetic you certainly do not want to build up a huge tree of
> thunks, which is what happens if you use result (non-strict) and "+"
> (say).

That's the impression I got - originally, I was using result' but I
thought I'd try result to see if it would improve the laziness. The I
realised that, short of profiling memory use and/or using a bigger
query, I couldn't actually tell :-)

> If you don't need the entire list at once, then push your processing
> into the iteratee.

Hmm, that's what I was trying to avoid. The article I mentioned made a
strong point that laziness allows you to factor out processing from IO
- so you can write (for example)

main = do
  s <- getContents
  let r = map processIt (lines s)
  putStr (unlines r)

and laziness means that IO is performed "on demand", so that the above
code never has to read the whole input into memory. I was hoping to do
something similar for database access, with runSql taking the place of
getContents. Having to incorporate "processIt" into the database
access code breaks this idiom.

> You are not obliged to return all of the data from
> doQuery in one big data structure, like a list. You can do IO in the
> iteratee, and even just return ().

That's what my earlier code looked like, and I found it harder to
understand than the getContents/process/put approach. I'm trying to
explore ways of factoring data manipulation code out of database
access functions, but maybe that's not the right way of doing it.

> I'm sure someone asked Oleg and I about lazy result-set processing in
> Takusen (and why it's not done) a few months ago (a private email, I
> think), but right now I'm unable to find our response.

That sounds like it's the same as what I'm trying to ask, so if you do
find the response, I'd be interested.

Thanks for the explanation.
Paul.