[Haskell-cafe] Quadratic complexity though use of STArrays

Tue Sep 22 15:31:08 EDT 2009

Dear haskell-cafe users,

I am constructing a shuffle function: given an StdGen and a list, return the
list permuted, with all permutations of equal probability.

There is the simlpe recursive definition: generate a number from 1 to length
list, take this element out from the list, call the function recursively on
the remaining list and then cons the element on the shuffled list.

A more imperative approach is to make the list an array, and traverse the
array in reverse, swapping the iterated element with an arbitrary element
less than or equal to the iterator.

These functions are implemented as shuffleRec and shuffleArr, respectively.

What complexity does these functions have?

I argue that the shuffleArr function should be O(n), since it only contains
one loop of n, where each loop does actions that are O(1): generating a random
number and swapping two elements in an array.

I argue that the shuffleRec function should be O(n^2), since for each call,
it creates a new list in O(n), with the drop and take calls, and calls itself
recursively. This yields O(n^2).

However, they both have the same runnig time (roughly), and through looking
at the plot it _very_ much looks quadratic.

I am compiling with GHC and I guess there is something in the lazy semantics
that confuses me about the complexities, and maybe I have misunderstood how
STArrays work.

Any pointers to what's going in is greatly appreciated!

Best regards,
Dan Rosén, Sweden

Here is the code:

module Main where

import Control.Monad
import Control.Monad.ST
import Data.Array.ST
import Data.STRef
import System.Random

import Time
import CPUTime

shuffleArr :: StdGen -> [a] -> [a]
shuffleArr g list = runST $ do
    let n = length list
    gref <- newSTRef g
    arr <- listToArray list
    forM_ [n,n-1..2] $ \p -> do
        m <- rand (1,p) gref
        swap arr m p
    getElems arr
  where
    rand range gref = do
        g <- readSTRef gref
        let (v,g') = randomR range g
        writeSTRef gref g'
        return v

    swap a n m = do
        [n',m'] <- mapM (readArray a) [n,m]
        mapM (uncurry $ writeArray a) [(m,n'),(n,m')]

listToArray :: [a] -> ST s (STArray s Int a)
listToArray list = let n = length list
                   in  newListArray (1,n) list

shuffleRec :: StdGen -> [a] -> [a]
shuffleRec g list = x:shuffleArr g' xs
  where
    (n,g')  = randomR (0,length list-1) g
    (x:xs') = drop n list
    xs      = take n list ++ xs'

-- A somewhat lame attempt to derive the complexities through testing,
-- prints the times for the different functions in a table
main :: IO ()
main = do
    let times = take 30 $ iterate (+30000) 10000
    answers <- mapM test times
    sequence_ [ putStrLn $ concatMap ((++ "\t"). show) [toInteger t,arr,rec]
              | (t,(arr,rec)) <- zip times answers
              ]

-- Perform a test of size n, and return the cycles it took for the different
-- algorithms in a pair. Evaluation is enforced by seq on length of the list.
test :: Int -> IO (Integer,Integer)
test n = do
    let list = [1..n]
    [g1,g2] <- replicateM 2 newStdGen
    length list `seq` do
        s  <- doTime ("shuffleArr " ++ show n) $
                 (length $ shuffleArr g1 list) `seq` return ()
        s' <- doTime ("shuffleRec " ++ show n) $
                 (length $ shuffleRec g2 list) `seq` return ()
        return (s,s')

-- This is taken from GenUtil from the JHC creator's homepage
doTime :: String -> IO a -> IO Integer
doTime str action = do
    start <- getCPUTime
    x <- action
    end <- getCPUTime
    let time = (end - start) `div` 1000000 -- `div` cpuTimePrecision
    -- putStrLn $ "Timing: " ++ str ++ " " ++ show time
    return time