substring search api

Tue Sep 18 15:02:49 EDT 2007

Bryan O'Sullivan <bos at serpentine.com> writes:

> Duncan Coutts wrote:
>
>> So perhaps that's my straw-man proposal:
>>   * change BS.findSubstring to be :: BS -> BS -> (BS, BS)
>>      in the style of List.break
>>   * remove the current BS.findSubstrings
>
> While List.break is useful, it has the equally useful
> counterpart (dropWhile . not . (==)) that doesn't accumulate
> the prefix of a match. For a long sequence, this has appeal.
> Let's say you're reading ten gigabytes of data over the
> network, so you have no control over the incoming chunk size
> (as we don't provide a rechunking mechanism at present, so
> this isn't a hypothetical issue).  A findSubstring that
> accumulates the prefix could easily cons an fatally large
> number of chunks.
>
> I'm not saying that the signature you suggest shouldn't be
> present, merely that it's not enough: it wants a counterpart
> that accumulates either nothing or something safe like an
> Int64 that counts the length of the prefix.

I'm not familiar with Bytestrings so I'm probably out of my
depth, but something that strikes me is that if you are
returned an index to a large object like this, to use it as
the offset it would have to be the offset from the beginning
of the large object, which would cause the large object to
be held in memory until the indexing/dropping expression is
evaluated. Or is there some more sophisticated form of
indexing for byte strings?

-- 
Jón Fairbairn                                 Jon.Fairbairn at cl.cam.ac.uk