[Haskell-cafe] Slower with ByteStrings?

Sat May 26 16:48:55 EDT 2007

Hello,

We recently had a challenge as follows:

Given a word, find all the words in the dictionary which can be made
from the letters of that word.  A letter can be used at most as many
times as it appears in the input word.  So, "letter" can only match
words with 0, 1, or 2 t's in them.

I opted for simplicity in my implementation including hard coding the
input word and using /usr/share/dict/words as the dictionary:

-- Begin Words.hs
module Main where

import List

-- I was lazy and borrowed perms from the Haskell wiki
-- but I wrote everything else
perms [] = [[]]
perms xs = [ x : ps | x <- xs, ps <- perms (xs\\[x]) ]

-- creates permutations of all lengths then cleans up duplicates and
gets rid of the
-- empty list, this is probably the least efficient way possible
allPerms x = drop 1 $ sort $ nub $ concatMap inits $ perms x

main = do wordList <- readFile "/usr/share/dict/words"
          let words = lines wordList
          mapM_ print $ filter (`elem` words) $ allPerms "ubuntu"
-- End Words.hs

Next I decided to try it with byte stings:

-- Begin ByteStringWords.hs
module Main where

import List
-- import qualified Data.ByteString.Char8 as C
import qualified Data.ByteString.Lazy.Char8 as C

-- I was lazy and borrowed perms from the Haskell wiki
-- but I wrote everything else
perms [] = [[]]
perms xs = [ x : ps | x <- xs, ps <- perms (xs\\[x]) ]

-- creates permutations of all lengths then cleans up duplicates and
gets rid of the
-- empty list, this is probably the least efficient way possible
allPerms x = drop 1 $ sort $ nub $ concatMap inits $ perms x

main = do wordList <- C.readFile "/usr/share/dict/words"
          let words = C.lines wordList
          mapM_ print $ filter (`elem` words) $ map C.pack $ allPerms "ubuntu"
-- End ByteStringWords.hs

I don't think the overhead to compute the permutations matters here as
the input to the permutations calculation is so small.  Any ideas why
the byte string version is slower?  (Strict bytestrings appear to be
about 2 seconds slower and lazy bytestrings appear to be about 1
second slower).

I think, given my simple algorithm that means that (==) for
ByteStrings is slower than (==) for String.  Is this possible?  I
think the program might be spending more time cleaning up after
execution with the ByteString versions as it seems to "stall" after
printing the last match.

Thanks,
Jason