[Haskell-beginners] Code help requested

David Frey dpfrey at shaw.ca
Tue Jan 12 11:44:12 EST 2010


On January 11, 2010 5:22:49 pm Joe Van Dyk wrote:
> I've written two versions of the same program, one in ruby and one in
> haskell.  Given words on stdin, find all the anagrams in those words.
> For nicer display, we're only going to display the top 3 results.
> 
> I'm obviously new to haskell.  The ruby version runs about 5x as fast
> on a large file.   How can I improve the haskell version?
> 
> http://gist.github.com/274774
> 
> # Ruby version
> input = STDIN.read.split("\n")
> result = Hash.new([])
> input.each do |word|
>   sorted_word = word.split('').sort.join
>   result[sorted_word] += [word]
> end
> values = result.values.sort { |a, b| b.size <=> a.size }
> p values[0..3]
> 
> # Haskell version
> import List
> import qualified Data.Map as Map
> 
> -- Given as stdin
> -- presents
> -- serpents
> -- no
> -- on
> -- whatever
> -- Expected Output:
> -- [["serpents","presents"],["on","no"]]
> 
> -- This version only displays words that have more than one
> -- match in the list, and sorts by the words that got the most matches.
> 
> -- Can we do the map bit better?
> 
> main = do
>   input <- getContents
>   print $ anagrams $ lines input
> 
> anagrams words =
>   sorted_anagrams
>   where
>     sorted_anagrams = sortBy sorter filtered_anagrams
>     sorter a b = compare (length b)  (length a)
>     filtered_anagrams = Map.elems $ Map.filter filter_function all_anagrams
>     filter_function words = length words > 1
>     all_anagrams = do_anagrams words Map.empty
>     do_anagrams [] result = result
>     do_anagrams words result = do_anagrams
>                                  (tail words)
>                                  (Map.unionWith
>                                    (++)
>                                    (Map.fromList
> [(sorted_current_word, [current_word])])
>                                    result)
>       where
>         current_word = head words
>         sorted_current_word = sort current_word
> 

These are the numbers I got once I modified your Haskell program to only print 
out 4 results the way the Ruby program does.

Your Haskell version: ~10.0 s
My Haskell version: ~2.5 s
Your Ruby version (Ruby 1.8): ~4.6 s
Your Ruby version (Ruby 1.9): ~4.2 s

This is my version of your program:

import Control.Monad (liftM)
import Data.Function (on)
import Data.List (sort, sortBy)
import qualified Data.ByteString.Char8 as B
import qualified Data.Map as Map

-- Given as stdin
-- presents
-- serpents
-- no
-- on
-- whatever
-- Expected Output:
-- [["serpents","presents"],["on","no"]]


main = do
    input <- liftM B.lines B.getContents
    let wordMap = buildMap $ map B.unpack input
    print $ take 4 (listAnagrams wordMap)


buildMap words = let
    entries = map (\x -> (sort x, [x])) words
    in Map.fromListWith (++) entries


listAnagrams wordMap = let
    anagrams = (Map.elems . Map.filter (\x -> length x > 1)) wordMap
    in sortBy (flip (compare `on` length)) anagrams


I found that the performance improved when I used ByteStrings to read the 
input and then unpacked to regular strings before creating the Map.  For some 
reason, using BytesStrings everywhere made the program slower.  Can anyone 
tell me why?

Dave


More information about the Beginners mailing list