[Haskell-cafe] Ultra-newbie Question
Daniel Fischer
daniel.is.fischer at web.de
Sat Sep 18 15:51:51 EDT 2010
On Saturday 18 September 2010 19:44:38, Jake McArthur wrote:
> On 09/18/2010 02:51 AM, Christopher Tauss wrote:
> > I am trying to write a function that takes a list and returns the last
> > n elements.
>
> This may just be for the sake of learning, in which case this is fine,
> but usually, needing to do this would be a sign that you are using lists
> improperly (since this is a O(n) time operation).
>
By which he meant O(length list), not the number of elements you're asking
for.
> > Let's call the function n_lastn and, given a list [1,2,3,4,5], I
> > would like
> > n_lastn 3 = [3,4,5]
>
> n_lastn n = reverse . take n . reverse
Which is the most elegant definition, but it's an O(length list) space
operation (as are all others proposed so far). That will be a problem for
long lists (consider n_lastn 10 [1 .. 10^8]). You can implement it as an
O(n) [the number of elements you want] *space* operation, provided that the
list is generated lazily and not referred to by anything else, but the code
is decidedly less nice:
The first idea would be to keep a window of n elements and move it to the
end of the list:
n_lastn n xs =
case splitAt n xs of
(ys,[]) -> ys -- the list contains at most n elements, yay
(ys,zs) -> loop ys zs
where
loop window [] = window
loop window (v:vs) = loop (tail window ++ [v]) vs
The space behaviour is now fine (if compiled with optimisations), but
unfortunately, the time complexity has become O(n*length list) because the
(++)-calls are left-nested:
Suppose n = 4,
loop (1:2:3:4:[]) [5 .. end]
~> loop ((2:3:4:[]) ++ [5]) [6 .. end]
~> loop (2:((3:4:[]) ++ [5])) [6 .. end]
~> loop (((3:4:[]) ++ [5]) ++ [6]) [7 .. end]
The strictness analyser has been smart enough to transform the call to tail
into a strict pattern match on window, as if we'd written
loop (_:twindow) (v:vs) = loop (twindow ++ [v]) vs
so the first few tails go fast, but later, we have to go through more
layers to get at the head of the window to discard it
~> loop ((3:((4:[]) ++ [5])) ++ [6]) [7 .. end]
~> loop (3:(((4:[]) ++ [5]) ++ [6])) [7 .. end]
-- finally!
~> loop ((((4:[]) ++ [5]) ++ [6]) ++ [7]) [8 .. end]
-- bubble the 4 through four layers of parentheses:
~> loop(((4:([] ++ [5])) ++ [6]) ++ [7]) [8 .. end]
~> loop ((4:(([] ++ [5]) ++ [6])) ++ [7]) [8 .. end]
~> loop (4:((([] ++ [5]) ++ [6]) ++ [7])) [8 .. end]
~> loop (((([] ++ [5]) ++ [6]) ++ [7]) ++ [8]) [9 .. end]
-- phew
-- form now on, it's uniform, on each step we have to match an expression
(((([] ++ [a]) ++ [b]) ++ [c]) ++ [d])
against (_:rest)
1. check whether ((([] ++ [a]) ++ [b]) ++ [c]) is empty, for that,
2. check whether (([] ++ [a]) ++ [b]) is empty, for that,
3. check whether ([] ++ [a]) is empty, for that,
4. check whether [] is empty, it is, hence [] ++ [a] = [a],
5. check whether [a] is empty, it's not, it's (a:[]), hence
6. (...) ++ [b] = a:([] ++ [b]), so 2's not empty, and
7. (...) ++ [c] = a:(([] ++ [b]) ++ [c]), so 1's not empty and
8. (...) ++ [d] = a:((([] ++ [b]) ++ [c]) ++ [d])
9. at last, a can be dropped and we get to
loop (((([] ++ [b]) ++ [c]) ++ [d]) ++ [e]) remainingList
Blech!
Appending to the end of a list is bad if it leads to left-nested
parentheses (it's okay if the parentheses are right-nested).
So we might be tempted to keep the window reversed and cons each element to
the front, dropping the last. No good, removing an element from the end is
O(length window) too.
One possibility to fix it is to use a 'list-like' type with O(1) appending
at the end and dropping from the front.
Data.Sequence is such a type,
import qualified Data.Sequence as Seq
import Data.Foldable (toList)
import Data.Sequence (ViewL(..), (|>))
n_lastn' :: Int -> [a] -> [a]
n_lastn' k _
| k <= 0 = []
n_lastn' k xs =
case splitAt k xs of
(ys,[]) -> ys
(ys,zs) -> go (Seq.fromList ys) zs
where
go acc [] = toList acc
go acc (v:vs) = case Seq.viewl acc of
_ :< keep -> go (keep |> v) vs
_ -> error "teh impossible jus hapnd"
fixes space and time behaviour. But the constant factors for Sequence are
larger than those for lists, so we can beat it with lists:
n_lastn :: Int -> [a] -> [a]
n_lastn k _
| k <= 0 = []
n_lastn k xs =
case splitAt k xs of
(ys,[]) -> ys
(ys,zs) -> go k (reverse ys) zs
where
m = k-1
go _ acc [] = reverse $ take k acc
go 0 acc (v:vs) = case splitAt m acc of
(keep,release) -> release `seq` go k (v:keep) vs
go h acc (v:vs) = go (h-1) (v:acc) vs
Every k steps, we perform the O(k) operation of removing the last (k+1)
elements from a (2k)-element list, making the removal from the end an
amortized O(1) operation.
You can trade some space for speed and clip the window in larger intervals,
say every (3*k) or every (10*k) steps.
More information about the Haskell-Cafe
mailing list