Persistant (as in on disk) data

Hal Daume III hdaume@ISI.EDU
Fri, 7 Mar 2003 17:22:33 -0800 (PST)


I'd not been following this discussion, but now it seems it's gotten to
instances of the Binary module.  I figured I'd chime in briefly:

> thanks for your replies.  i browsed thrugh the discussion on the 
> libraries list, but it mainly seems to discuss if one should use bits or 
> bytes in the binary representation.  not that this is not important (my 

The bits/bytes argument was largely because the NHC library supported Bits
and the GHC library supported Bytes.  In order ot have a common library,
we wanted to support both.

> personal preference is to be fast rather then small, within reason), but 
> i was more interested in what these functions should do.  unfortunately 
> i couldn't quite figure that out from the discussion there.

Basically, write arbitrary data to a file in a binary fashion, or to a
memory location (as in BinMem).

> in particular, i was thinking that this dumping facility should preserve 
> sharing and support cyclic data.  as such, i don't think one can write 

I'm not convinced that the binary library should "natively" support cyclic
data.  I think that if saying:

  print x

would not terminate, then there's no reason that

  puts bh x

should terminate.  I like to think of "puts" as a binary version of
print.  (That is, of course, unless the instance writer for the
Binary/Show instances of the type of x is smart enough to not keep writing
the same thing over and over again.)  I would challenge the interested
party to write a Show instance of String which wouldn't loop indefinitely
on "repeat 'x'".

If the user has some cyclic data structure and they want to be able to
write it in binary form, it should be on their shoulders to do it
correctly, not on the library's.

So essentially, I believe 'deriving Binary' should work identically to
'deriving Show', except using a binary rep instead of a string rep.

> it in Haskell, as presumbably sharing is not observable from within the 
> language.  this is why the "deriving" bit seems essential - the compiler 
> can perform some magic.

I assume you mean something like:

  let x = ...some really large structure...
      y = [x,x]
  in  puts bh y

then the size of what is written is |x+c| not |2x| for some small c?  If
so, then I don't believe this can be implemented in the language; it would
have to be in the compiler.  I see this as unlikely of happening because
it would mean that all compilers would have to implement this identically
and some might not handle sharing the same manner.  It might be nice, but
again, I see this as something you could do yourself if you really want it
(i.e., replace this function with:

  let x = ...
  in  puts bh 2 >> puts bh x

or something like that, when you can -- and obviously you won't always be
able to.)

 - Hal