Speeding up Data.List.inits

Sat Jul 19 17:33:20 UTC 2014

I did a little more benchmarking, and it turns out that the overhead
to use Okasaki-style persistently efficient banker's queues is not
terribly severe because these queues are simple and therefore fast
(it's around 1.5 times as slow at its very worst, but that may have
been a fluke--the performance generally seems comparable). It does
indeed give far, far better performance when looking at the heads of
many of the lists at the end of the result, as Edward Kmett predicted.
I don't know if these queues are available in a standard library, but
the following is a full implementation of initsQ, including what it
needs of the queue implementation:

data Queue a = Queue
  {front :: [a],
   lenF :: Int,
   rear :: [a],
   lenR ::Int} deriving (Show)

empty = Queue {front=[],lenF=0,rear=[],lenR=0}

queue f lF r lR
  | lR <= lF = Queue f lF r lR
  | otherwise    = Queue {front = f ++ reverse r,
                          lenF = lF + lR,
                          rear = [],
                          lenR = 0}

(Queue f lF r lR) |> x = queue f lF (x:r) (lR+1)

toListQ (Queue f _ r _) = f ++ reverse r

initsQ xs = map toListQ (scanl (|>) empty xs)

On Sat, Jul 19, 2014 at 5:25 AM, Edward Kmett <ekmett at gmail.com> wrote:
>
> The half-baked thought was that reaching the kth list is O(k), and reversing _each list_ is O(k), but with a smarter structure you might share some of the work of construction across multiple lists. e.g. if you had an inits like function that returned a list of queues rather than a list of lists you could actually do less work overall setting up the inits asymptotically, sharing the work. If you only take a constant prefix off each queue then this can change the overhead from O(n^2) to = O(kn).
>
> A reversal or difference list requires you to pay separately for each list.
>
> I we are in "violent agreement", however, as I indicated in my comment that the constant factors almost definitely make it a non-starter for something like Data.List. ;)
>
> -Edward
>
>
> On Sat, Jul 19, 2014 at 4:30 AM, David Feuer <david.feuer at gmail.com> wrote:
>>
>> Edward Kmett, reaching the kth list is O(k). Reversing it is also O(k). I don't know that anything as fancy as something by Okasaki could improve matters—such data structures tend to have large constant factors. Something much worse than that seems to be wrong with the Data.List version at the moment.
>>
>>
>> On Sat, Jul 19, 2014 at 4:25 AM, Edward Kmett <ekmett at gmail.com> wrote:
>>>
>>> Good catch, the common inits and tails version of subsequences makes pretty heavy use of that, as do many of the uses of inits where you use it to get k-element substrings.
>>>
>>> I'd be curious if an okasaki-style catenable output restricted deque could recover both efficient head access and reasonably efficient appends, but the overhead of that is probably way out of line with the performance at stake!
>>>
>>> -Edward
>>>
>>>
>>> On Sat, Jul 19, 2014 at 4:14 AM, Henning Thielemann <schlepptop at henning-thielemann.de> wrote:
>>>>
>>>> Am 19.07.2014 09:51, schrieb David Feuer:
>>>>
>>>>
>>>>> Background: I was looking at the code for Data.List.inits in
>>>>> base-4.7.0.0 (function renamed for clarity):
>>>>>
>>>>> initsDL                   :: [a] -> [[a]]
>>>>> initsDL xs                =  [] : case xs of
>>>>>                                    []      -> []
>>>>>                                    x : xs' -> map (x :) (initsDL xs')
>>>>>
>>>>> The recursive maps made me suspicious. I decided to try writing a
>>>>> different version:
>>>>>
>>>>> initsHO xs = map ($ []) (scanl q id xs)
>>>>>    where
>>>>>      q f x = f . (x:)
>>>>>
>>>>> I mentioned this on #haskell and noted that it would be a nice exercise
>>>>> to write it with less fancy function footwork using map reverse. rasfar
>>>>> responded (modulo naming) with
>>>>>
>>>>> initsR xs = map reverse (scanl q [] xs)
>>>>>    where
>>>>>      q acc x = x : acc
>>>>
>>>>
>>>>
>>>> The 'map' gives efficient access to the first elements of the sub-lists, thus it seems that initsDL is faster than initsR when you access only prefixes of the sub-lists. E.g.
>>>>
>>>> Prelude> head $ last $ inits [0..10000000::Int]
>>>>
>>>> _______________________________________________
>>>> Libraries mailing list
>>>> Libraries at haskell.org
>>>> http://www.haskell.org/mailman/listinfo/libraries
>>>
>>>
>>
>