ByteString-backed Handles, and another couple of questions

Simon Marlow marlowsd at gmail.com
Tue Dec 15 04:39:59 EST 2009


On 15/12/09 06:09, Bryan O'Sullivan wrote:

> I just added support to Data.Text for your new Unicode-based Handle
> implementation, and I'd like to write some tests. The natural way to do
> this would be to create Handles that will write to, and read from,
> ByteStrings. Does any such code exist at the moment? I don't see it in
> base or bytestring, though all the necessary abstractions appear to be
> present.

I haven't implemented a bytestring-backed Handle, but as you say all the 
abstractions should be present.  It would be a great thing to have on 
Hackage.

A good starting point would be the mmap-backed Handle code that I wrote 
for my talk at the Haskell Implementors Workshop last year.  I'd 
intended to polish this up and upload to Hackage, but never got around 
to it.  I've put the code here for now:

http://www.haskell.org/~simonmar/mmap-handle.tar.gz

> Also, the place I hooked into the new I/O machinery was at the next
> level up from CharBuffer. Because the implementation of CharBuffer isn't
> abstract, I had no opportunity to put a text array in there, so there's
> an extra amount of copying that happens when going from byte buffer to
> char buffer to Text. It's a bit of a shame, but I don't see a way around
> it at the moment. Would you be interested in trying to remove that extra
> copy, or is the current interface set in stone?

Yes, you may remember we talked about this in Edinburgh (the conversion 
would probably make more sense to you now than it did then :-).

One thing I experimented with is making CharBuffers use UTF-16.  You'll 
see some instances of #ifdef CHARBUF_UTF16 in the code - it partially 
works, I believe the main missing piece is support in the built-in 
codecs.  I don't think it would be too hard to fix them, they just need 
to more abstract about offsets in the CharBuffer; 
writeCharBuffer/readCharBuffer already handle the UTF-16 encoding/decoding.

So one possibility is to get this working and then avoid the extra copy 
by just taking out the ByteArray# inside a CharBuffer and turning it 
into a text buffer. I'm not sure of the details here, but I imagine 
something along those lines would work.  We would then have to allocate 
a new CharBuffer for the Handle.

Another possibility is (as you suggested) to make Handles independent of 
the representation of the CharBuffer, making it completely abstract.  I 
haven't put much thought into that, it might well be a better approach. 
  It would presumably involve a new existential class constraint in the 
Handle for the CharBuffer operations, and we'd have to be careful about 
performance: currently I think the CharBuffer operations get inlined nicely.

Cheers,
	Simon


More information about the Glasgow-haskell-users mailing list