[Haskell-beginners] Code help requested
Tim Perry
perry2of5 at yahoo.com
Tue Jan 12 21:38:46 EST 2010
I compiled the original version, Yusaka's version, and a version I wrote and found the following:
$ time ./Anagram_me < /usr/share/dict/words > /dev/null
real 0m2.197s
user 0m2.040s
sys 0m0.160s
$ time ./Anagram_JoeVanDyke < /usr/share/dict/words > /dev/null
real 0m4.570s
user 0m4.290s
sys 0m0.260s
perry at emperor:~/haskell$ time ./Anagram_Yusaku < /usr/share/dict/words > /dev/null
real 0m1.337s
user 0m1.230s
sys 0m0.100s
From
this, it looks like mine version takes less than half the time of the
original. However, if I run a bigger dictionary (Ubuntu package
wamerican-large instead of wamerican-small) then I'm only about 30%
faster than the original. This makes me think I have some sort of
exponential data structure growth going on. Here is my version. Can anyone confirm that data
structure growth is the problem with my approach? Thanks, Tim
import Data.List as Lst
import Data.Map as Map
-- This version only displays words that have more than two
-- match in the list, and sorts by the words that got the most matches.
-- Can we do the map bit better?
main = do
input <- getContents
print $ anagrams $ lines input
anagrams words =
sorted_anagrams
where
sorted_anagrams = sortBy sorter filtered_anagrams
sorter a b = compare (length b) (length a)
longEnoughWords = [x | x <- words, length x > 1]
filtered_anagrams = [x | x <- Map.elems $ foldr insert empty $ wordPairs, length x > 2]
where
wordPairs = zip (Prelude.map Lst.sort longEnoughWords) longEnoughWords
insert (sorted, original) = insertWith (++) sorted [original]
----- Original Message ----
From: Daniel Fischer <daniel.is.fischer at web.de>
To: beginners at haskell.org
Sent: Tue, January 12, 2010 10:46:03 AM
Subject: Re: [Haskell-beginners] Code help requested
Am Dienstag 12 Januar 2010 17:44:12 schrieb David Frey:
> These are the numbers I got once I modified your Haskell program to only
> print out 4 results the way the Ruby program does.
>
> Your Haskell version: ~10.0 s
> My Haskell version: ~2.5 s
> Your Ruby version (Ruby 1.8): ~4.6 s
> Your Ruby version (Ruby 1.9): ~4.2 s
I sincerely hope your input file is smaller than mine :)
>
> This is my version of your program:
>
> import Control.Monad (liftM)
> import Data.Function (on)
> import Data.List (sort, sortBy)
> import qualified Data.ByteString.Char8 as B
> import qualified Data.Map as Map
>
> -- Given as stdin
> -- presents
> -- serpents
> -- no
> -- on
> -- whatever
> -- Expected Output:
> -- [["serpents","presents"],["on","no"]]
>
>
> main = do
> input <- liftM B.lines B.getContents
> let wordMap = buildMap $ map B.unpack input
> print $ take 4 (listAnagrams wordMap)
>
>
> buildMap words = let
> entries = map (\x -> (sort x, [x])) words
> in Map.fromListWith (++) entries
>
>
> listAnagrams wordMap = let
> anagrams = (Map.elems . Map.filter (\x -> length x > 1)) wordMap
> in sortBy (flip (compare `on` length)) anagrams
>
>
> I found that the performance improved when I used ByteStrings to read
> the input and then unpacked to regular strings before creating the Map.
> For some reason, using BytesStrings everywhere made the program slower.
> Can anyone tell me why?
Yes. ByteString's sort is a bucket-sort. It allocates an array of
256*sizeof(CSize) bytes and counts the occurrences of each character.
That's fine for long ByteStrings, but for short ByteStrings like those we
consider here, allocating a bucket-array of 1K or 2K is incredibly much.
Sorting plain Strings is faster (not very much, though) and uses (much)
less memory if they are short.
You can further speed up your programme if you put lists of ByteStrings in
your Map (less memory, less GC) and unpack them only for sorting (and
finally for output):
main = do
input <- liftM B.lines B.getContents
let wordMap = buildMap input
print $ take 4 (listAnagrams wordMap)
buildMap words = let
entries = map (\x -> (B.pack . sort $ B.unpack x, [x])) words
in Map.fromListWith (++) entries
listAnagrams wordMap = let
anagrams = (Map.elems . Map.filter (\x -> length x > 1)) wordMap
-- a small speedup can be obtained by not using length:
-- Map.filter (not . null . drop 1)
-- or Map.filter (\l -> case l of { (_:_:_) -> True; _ -> False })
-- if there are many long lists in the map, the speedup will become
-- significant
in map (map B.unpack) $ sortBy (flip (compare `on` length)) anagrams
Yours:
$ ./DFAnagrams +RTS -sstderr < /usr/share/dict/words > /dev/null
./DFAnagrams +RTS -sstderr
1,218,862,708 bytes allocated in the heap
544,113,420 bytes copied during GC
98,018,856 bytes maximum residency (10 sample(s))
768,552 bytes maximum slop
211 MB total memory in use (2 MB lost due to fragmentation)
Generation 0: 2315 collections, 0 parallel, 1.98s, 2.01s elapsed
Generation 1: 10 collections, 0 parallel, 1.28s, 1.53s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 3.60s ( 3.61s elapsed)
GC time 3.26s ( 3.54s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 6.86s ( 7.15s elapsed)
%GC time 47.5% (49.4% elapsed)
Alloc rate 338,176,931 bytes per MUT second
Productivity 52.4% of total user, 50.3% of total elapsed
Modified:
$ ./DFBAnagrams +RTS -sstderr < /usr/share/dict/words > /dev/null
./DFBTAnagrams +RTS -sstderr
1,108,946,552 bytes allocated in the heap
237,869,304 bytes copied during GC
41,907,844 bytes maximum residency (10 sample(s))
4,374,152 bytes maximum slop
89 MB total memory in use (1 MB lost due to fragmentation)
Generation 0: 2091 collections, 0 parallel, 1.14s, 1.19s elapsed
Generation 1: 10 collections, 0 parallel, 0.40s, 0.50s elapsed
INIT time 0.00s ( 0.00s elapsed)
MUT time 2.90s ( 2.90s elapsed)
GC time 1.54s ( 1.69s elapsed)
EXIT time 0.00s ( 0.00s elapsed)
Total time 4.44s ( 4.59s elapsed)
%GC time 34.7% (36.8% elapsed)
Alloc rate 382,369,915 bytes per MUT second
Productivity 65.2% of total user, 63.1% of total elapsed
versus 26.25s for the Ruby version (ruby 1.8.7).
Yay!
>
> Dave
_______________________________________________
Beginners mailing list
Beginners at haskell.org
http://www.haskell.org/mailman/listinfo/beginners
More information about the Beginners
mailing list