[Haskell-beginners] Code help requested

Mon Jan 11 20:22:49 EST 2010

I've written two versions of the same program, one in ruby and one in
haskell.  Given words on stdin, find all the anagrams in those words.
For nicer display, we're only going to display the top 3 results.

I'm obviously new to haskell.  The ruby version runs about 5x as fast
on a large file.   How can I improve the haskell version?

http://gist.github.com/274774

# Ruby version
input = STDIN.read.split("\n")
result = Hash.new([])
input.each do |word|
  sorted_word = word.split('').sort.join
  result[sorted_word] += [word]
end
values = result.values.sort { |a, b| b.size <=> a.size }
p values[0..3]

# Haskell version
import List
import qualified Data.Map as Map

-- Given as stdin
-- presents
-- serpents
-- no
-- on
-- whatever
-- Expected Output:
-- [["serpents","presents"],["on","no"]]

-- This version only displays words that have more than one
-- match in the list, and sorts by the words that got the most matches.

-- Can we do the map bit better?

main = do
  input <- getContents
  print $ anagrams $ lines input

anagrams words =
  sorted_anagrams
  where
    sorted_anagrams = sortBy sorter filtered_anagrams
    sorter a b = compare (length b)  (length a)
    filtered_anagrams = Map.elems $ Map.filter filter_function all_anagrams
    filter_function words = length words > 1
    all_anagrams = do_anagrams words Map.empty
    do_anagrams [] result = result
    do_anagrams words result = do_anagrams
                                 (tail words)
                                 (Map.unionWith
                                   (++)
                                   (Map.fromList
[(sorted_current_word, [current_word])])
                                   result)
      where
        current_word = head words
        sorted_current_word = sort current_word

-- 
Joe Van Dyk
http://fixieconsulting.com