Large lists, heaps, stacks...

Till Doerges till@doerges.net
Thu, 18 Oct 2001 19:41:45 +0200


--opJtzjQTFsWo+cga
Content-Type: text/plain; charset=us-ascii
Content-Description: Letter
Content-Disposition: inline

Hi everybody,

I'm having to deal w/ rather long(*) lists. Unfortunately I stumbled
across some problems in the process.

I tried to implement a function that separates a list into two parts
according to a list of indices given. (Does anything like that perhaps
exist already in the Prelude?)

--- snip ---
-- Put elems at specified positions in 2nd list.
-- Non-existant positions will be ignored.
-- The order of the input list won't be altered.
select :: [a] -> [Integer] -> ([a],[a])
select xs poss = sAcc xs (sort poss) 0 ([],[])
    where sAcc :: [a] -> [Integer] -> Integer -> ([a],[a]) -> ([a],[a])
	  sAcc []     _          _      (ys,zs) = (ys,zs)
	  sAcc xs     []         _      (ys,zs) = (ys++xs, zs)
	  sAcc (x:xs) (pos:poss) curpos (ys,zs) =
	      if (pos == curpos)
	      then sAcc xs poss       (curpos+1) (ys, zs++[x])
	      else sAcc xs (pos:poss) (curpos+1) (ys++[x], zs)
--- snap ---

Calling a function like the following one, will result in a segfault
both under Linux (2.4.7) and Solaris (2.8) w/ hugs-feb-2000.

--- snip ---
Crash> select [0..500000] [0,1,2,20,418,451596,451616]
(
(11290502 reductions, 22129474 cells)
Segmentation Fault
--- snap ---

I suspect it's the (++) that's causing problems here and that it's
filling up my C-stack (like stated at
http://www.cse.ogi.edu/PacSoft/projects/Hugs/pages/bugsandfeatures.htm).

So I tried rewriting the function in order to get rid of the (++).

--- snip ---
-- Hopefully better implementation of select.
select' :: [a] -> [Integer] -> ([a],[a])
select' xs poss = sAcc xs (sort poss) 0 ([],[])
    where sAcc :: [a] -> [Integer] -> Integer -> (([a],[a]) -> ([a],[a]))
	  sAcc [] _  _ = trace "hi1" $ id
	  sAcc xs [] _ = trace "hi2" $ (\ (ys,zs) -> (ys++xs,zs))
	  sAcc (x:xs) (pos:poss) curpos =
	      if (pos == curpos)
		 then (\ (ys,zs) -> (  ys,x:zs)) . (sAcc xs poss       (curpos+1))
 		 else (\ (ys,zs) -> (x:ys,  zs)) . (sAcc xs (pos:poss) (curpos+1))
--- snap ---

Now, something like the above

--- snip ---
Crash> select' [0..500000] [0,1,2,20,418,451596,451616]
 
(138397 reductions, 244907 cells)
ERROR: Control stack overflow
--- snap ---

will not segfault, but it'll give me a control stack overflow. (Better
than a segfault, but still rather undesirable).

A second point I discovered is this one:

--- snip ---
Crash> select' "test" [0..]
 
(35922 reductions, 63905 cells)
ERROR: Control stack overflow
--- snap ---

Instead of just giving me '("","test")'. I considered this case being taken
care of by the first pattern in sAcc. Obviously it isn't.

So my questions are:

  1) Could anybody please explain the behaviour of select and select'
     to me?
     To add the icing on the cake: How can I improve select?

  2) Why does "select' "test" [0..]" not work?


I hope my questions aren't too silly.

Any hints would be greatly appreciated.

Thanks -- Till

(*) At the moment "rather long" means approx. 500,000 elems. But
    perhaps I'll need even more elements in a list.
-- 
     e-mail: reverse(net dot doerges at till)   | ENCRYPTED |
    pgp/gpg: keys via keyserver or my homepage  |  MAIL IS  |
        www: http://www.doerges.net             |  WELCOME! |
--opJtzjQTFsWo+cga
Content-Type: text/plain; charset=iso-8859-1
Content-Description: Crash.hs
Content-Disposition: attachment; filename="Crash.hs"
Content-Transfer-Encoding: 8bit

-- --------------------------------------------------------------------
--
-- Crash.hs: Bugs in hugs and ghc
--
--  Author:  Till Dörges  <till@doerges.net>
--
-- --------------------------------------------------------------------
-- For hugs see:
--  http://www.cse.ogi.edu/PacSoft/projects/Hugs/pages/bugsandfeatures.htm
-- --------------------------------------------------------------------

module Crash where

import List
import Trace

-- Put elems at specified positions in 2nd list.
-- Non-existant positions will be ignored.
-- The order of the input list won't be altered.
select :: [a] -> [Integer] -> ([a],[a])
select xs poss = sAcc xs (sort poss) 0 ([],[])
    where sAcc :: [a] -> [Integer] -> Integer -> ([a],[a]) -> ([a],[a])
	  sAcc []     _          _      (ys,zs) = (ys,zs)
	  sAcc xs     []         _      (ys,zs) = (ys++xs, zs)
	  sAcc (x:xs) (pos:poss) curpos (ys,zs) =
	      if (pos == curpos)
	      then sAcc xs poss       (curpos+1) (ys, zs++[x])
	      else sAcc xs (pos:poss) (curpos+1) (ys++[x], zs)



-- Crashes hugs. Possibly due to exhausted C-Stack.

-- Tested against: hugs Version: February 2000
-- Case 1: Linux atlan 2.4.7-int-ac3 #1 Wed Aug 1 01:54:15 CEST 2001 i686 unknown
--         HUGSFLAGS="+s +h2M -P<pretty-3.0>:<parsec-0.2>:<x11>:"
-- Case 2: SunOS userv1 5.8 Generic_108528-09 sun4u sparc SUNW,Ultra-80
--         HUGSFLAGS="-98 -o +s +h10M -P<pretty-3.0>:<parsec-0.2>: 

-- Crash> :t select
-- select :: [a] -> [Integer] -> ([a],[a])
-- Crash> :t crash1
-- crash1 :: ([Integer],[Integer])
-- Crash> crash1
-- Segmentation fault (core dumped)

crash1=select [0..500000] [0,1,2,20,418,451596,451616]


-- Hopefully better implementation of select.
select' :: [a] -> [Integer] -> ([a],[a])
select' xs poss = sAcc xs (sort poss) 0 ([],[])
    where sAcc :: [a] -> [Integer] -> Integer -> (([a],[a]) -> ([a],[a]))
	  sAcc [] _  _ = trace "hi1" $ id
	  sAcc xs [] _ = trace "hi2" $ (\ (ys,zs) -> (ys++xs,zs))
	  sAcc (x:xs) (pos:poss) curpos =
	      if (pos == curpos)
		 then (\ (ys,zs) -> (  ys,x:zs)) . (sAcc xs poss       (curpos+1))
 		 else (\ (ys,zs) -> (x:ys,  zs)) . (sAcc xs (pos:poss) (curpos+1))

--opJtzjQTFsWo+cga--