[Haskell-cafe] The difficulty of designing a sequence class

Mon Jul 31 10:50:36 EDT 2006

Robert Dockins wrote:
> On Jul 30, 2006, at 5:28 PM, Brian Hulley wrote:
>> Robert Dockins wrote:
> So, what you want is a sequence of sequences that can be
> transparently converted to a "flattened" sequence and vice versa?

Yes as long as the conversion between them takes no time at all - the 
sequence of sequences and flattened sequence must coexist simultaneously. 
The concrete data structure is a sequence of sequences and the flattened 
sequence is a particular view of it.

> And you also want to keep track of the total number of lines and
> characters within each line.  Additionally, you want to keep track of
> the maximum number of characters in any one line.
>
> Edison has support for transparently keeping track of the size of a
> sequence.
>
> http://www.eecs.tufts.edu/~rdocki01/docs/edison/Data-Edison-Seq-
> SizedSeq.html

I used this in an initial version of an edit buffer when I just used a 
SizedSeq wrapping a BinaryRandList to store the text as a sequence of chars. 
But the lack of ability to also index by line number and keep track of max 
line length was the problem that led me to use a finger tree instead.

Of course I could have used a sequence of chars, a sequence of line lengths, 
and a bag of line lengths to keep track of everything, and kept them in 
sync, but after reading the FingerTree paper I was seduced by the idea of 
being able to represent all this stuff at once in a single data structure.

>
> It may well be possible to create a slightly generalized wrapper that
> keeps track of arbitrary "measures".  (If they can be computed by a
> function which is associative, commutative and has a unit).
> Humm, sort of an incremental fold.... I like it.

I got this from the FingerTree paper. A finger tree supports any measurement 
that is a Monoid (so it needs to be associative but not commutative (if it 
had to be commutative it would be impossible to use a sequence as a set or 
map, which I needed for my Trie structure)).

> Well, I guess I'd suggest you attempt to identify specific problems
> with already existing packages and attempt to work with those who
> maintain such packages before reinventing something as basic (and
> difficult to get right!) as data structure abstractions.

The problem is that some people will be using Data.Edison.Seq at the moment 
and will naturally not want it to change. However I'd suggest that all the 
common operations be factored out into separate classes eg:

    class Foldable f  where
         fold :: (a -> b -> b) -> b -> f a -> b
         foldL :: ...

    class Reduce f where -- based on FingerTree paper
        reduceR :: (a -> b -> b) -> (f a -> b -> b)
        reduceL :: (b -> a -> b) -> (b -> f a -> b)

    class TakeDrop f where
        take :: Int -> f a -> f a
        takeWhile :: (a -> Bool) -> f a -> f a
        drop ...

    class Filter f where
        filter :: (a -> Bool) -> f a -> f a
        partition :: (a -> Bool) -> f a -> (f a, f a)

    class Indexable f where
       length :: f a -> Int
       at :: Int -> f a -> f a -- (*)
       splitAt :: Int -> f a -> (f a, f a)

(*) Data.ByteString.index puts the Int arg second. It's not at all clear to 
me which is best, because I often wish that the Int arg of take and drop was 
second also so I could write take g $! x+1 instead of (take $! x + 1) g 
though it's consistent with the arg order for takeWhile etc.

I know you don't agree with the no-exception-camel-case idea, but I still 
would argue that this is essential if you want to have a consistent naming 
convention. I find it extremely confusing that a word like "reducer" is 
supposed to be read as "reduceR" because the word "reducer" means to me 
"something which reduces". It seems to me that a restructuring of the usual 
fold, reduce ops into classes is a great opportunity to perfect the naming 
of these functions to make life easier for generations to come... :-)

>
> Such maintainers may be willing to accept patches and/or implement
> requested features in order to reduce fragmentation in this space
> *hint, hint*  :-)
>

Point taken, although in the case of the above refactoring idea, I think it 
really is a Haskell-wide task because although there appears to be a defacto 
standard use of names like take, drop, splitAt etc, it's not nearly so clear 
which ops belong together and which should be separated out, and I 
personally don't have enough experience of Haskell yet to be able to 
recommend a solution.

>
> <soapbox type="Edison plug">
> I personally think that Edison is a great piece of work, and I took
> up maintainership because I felt it was a great shame that no one was
> using it.  My ultimate goal is to make Edison the package that
> everyone thinks of first when they discover they need a Haskell
> datastructure for some purpose.  Even if Edison does not fill that
> need, I want every Haskeller to compare his needs against what Edison
> provides before striking out on his own, and I want that to be a
> decision made with some hesitation.  Over time I hope to make the
> cases where Edison doesn't cut the mustard fewer and further between.
>
> So, if you've ever looked at Edison, or ever do so in the future, and
> decide it isn't what you need, please let me know why so I can make
> it better for the next time.  After all, squeaky wheels get the
> grease, but only if I can hear the squeaking!
> </soapbox>

Best regards,
Brian.

-- 
Logic empowers us and Love gives us purpose.
Yet still phantoms restless for eras long past,
congealed in the present in unthought forms,
strive mightily unseen to destroy us.

http://www.metamilk.com