mmap and IArray slices

David Roundy droundy@abridgegame.org
Sat, 26 Apr 2003 12:21:52 -0400


I had a couple ideas, and I was wondering if they were good ones.

One is the idea of a slice function on an IArray:

a' = slice a (i,j) giving an array with the elements i..j of array a.

The nifty idea would be that this could be done without any memory copy.
The catch, of course, would be that you might end up having to retain the
entire array a just because you need a little slice of it.  However, I
think this risk (which should be documented) would be overwhelmed by the
increased efficiency (both in space and time) gains that would be
possible.

The most obvious (to me at least, since I've been working a lot with
PackedStrings recently) application would be in substrPS.  Using slice,
this would be an O(1) operation, rather than an O(n) operation, and would
use essentially no memory (at the cost of retaining the mother string).

I'm thinking that it may even be possible to make the GC smart enough to
get rid of mother strings which are needed by very few children (at least
if it's desperate), if array slices were supported at a low enough
level... I imagine they'd have to be supported at a very low level.

The second (and related) idea is that it would be nice to be able to use
mmap on a readonly file to read it into an immutable array.  Of course,
this works really nicely with array slices, since with the slices the
mmapped array wouldn't need to be copied, and could remain backed by the
file itself.  I imagine being able to read a 100M file into a PackedString,
then run linesPS (or possibly a special slice version of linesPS) on it,
and have a [PackedString], all of which is backed by the file, so when
memory gets tight it can be just thrown away (by the VM system) and then
get read in again when I use it.

The mmap idea seems even tougher than the slice idea, as it would require
(preferably) keeping track of all mmapped regions so when the file is
closed they could all be copied over into normal memory and unmapped.
Technically they could remain mmapped after the file is closed, but I think
that would cause problems if the program then wants to reopen the file for
writing.

Anyhow, those are just a couple of ideas I had, and I have nowhere near the
knowledge if haskell internals to tell if they are workable or not.
-- 
David Roundy
http://www.abridgegame.org