[Haskell-beginners] problems using bytestrings

Joe Van Dyk joe at fixieconsulting.com
Tue Jan 19 16:34:41 EST 2010


On Tue, Jan 19, 2010 at 12:49 PM, Daniel Fischer
<daniel.is.fischer at web.de> wrote:
> Am Dienstag 19 Januar 2010 20:40:38 schrieb Joe Van Dyk:
>> Hello,
>>
>> I wrote a function that finds anagrams in a file.  It worked great.
>> Until I tried to use Bytestrings to get better performance.
>>
>> Here's the code:
>> (http://github.com/joevandyk/haskell/blob/aa61a58e6a027dda60a32ae64ec99f
>>92d00ae5ed/pearls/anagrams/anagram.hs) import qualified Data.Map as Map
>> import Data.Ord
>> import Data.List
>> import qualified Data.ByteString.Char8 as BS
>>
>> -- what's the type here?  I get an infinite type error
>> anagrams :: Ord a => [a] -> [a]
>
> anagrams :: Ord a => [[a]] -> [[[a]]]
>
> But ask ghci, that's faster to answer such questions than the list :)
>
> Don't give a type signature, load the module,
>
> ghci> :t anagrams
> anagrams :: Ord a => [[a]] -> [[[a]]]
>
>> anagrams words =
>>   sorted_anagrams
>>   where
>>     sorted_anagrams      = sortBy (flip $ comparing length) get_anagrams
>>     get_anagrams         = Map.elems $ foldl' insert_word Map.empty
>> words insert_word map word = Map.insertWith' (++) (sort word) [word] map
>
> You sort each word from the input list, so word must have type
>
> Ord a => [a],
>
> thus the input has type
>
> Ord a => [[a]].
>
> The map you build maps Strings ([Char], generally, Ord a => [[a]]) to lists
> of Strings (so Map String [String] === Map [Char] [[Char]], in general,
> Ord a => Map [a] [[a]]), thus Map.elems gives a list of (lists of Strings),
> that is [[String]] === [[[Char]]], in general, [[[a]]].
>
>>
>> main = do
>>   -- original code, worked fine:
>>   -- input <- getContents
>>   -- print $ take 3 $ anagrams $ lines input
>>   -- new code with bytestring has errors
>>   input <- BS.getContents
>>   print $ take 3 $ anagrams $ BS.lines input
>>
>>
>> There are two errors.  One:
>> anagram.hs:7:0:
>>     Occurs check: cannot construct the infinite type: a = [a]
>>     When generalising the type(s) for `anagrams'
>>
>>
>> That happens when I add the type to the anagrams method.
>>
>> If I don't have a type specified, then I get this error:
>> anagram.hs:21:30:
>>     Couldn't match expected type `[a]'
>>            against inferred type `BS.ByteString'
>>       Expected type: [[a]]
>>       Inferred type: [BS.ByteString]
>>     In the second argument of `($)', namely `BS.lines input'
>>     In the second argument of `($)', namely `anagrams $ BS.lines input'
>>
>>
>> I'm lost here.  Help!
>
> ByteStrings are not lists, so Data.List.sort can't do anything with them.
> You could change one line of the the code of anagrams to
>
> insert_word map word = Map.insertWith' (++) (BS.sort word) [word] map
>
> which restricts the type of anagrams to
>
> anagrams :: [ByteString] -> [[ByteString]]
>
> But, ByteStrings sort isn't good for short ByteStrings (allocate an array
> of 256 slots, count how often each character occurs, write in order to new
> ByteString - the overhead of allocating the array is larger than the
> sorting cost for short ByteStrings).
>
> For ByteStrings as short as normal words in
> English/French/German/Italian/Spanish, it's much better to unpack the
> ByteStrings for sorting and change the line to
>
> insert_word map word = Map.insertWith' (++)
>                        (BS.pack . sort . BS.unpack $ word) [word] map

Thanks for the response.  I was getting confused -- I'm used to
thinking about a String as being its own type, instead of being an
array of chars.

I now have:

anagrams :: [BS.ByteString] -> [[BS.ByteString]]
anagrams words =
  sorted_anagrams
  where
    sorted_anagrams      = sortBy (flip $ comparing length) get_anagrams
    get_anagrams         = Map.elems $ foldl' insert_word Map.empty words
    insert_word map word = Map.insertWith' (++) sorted_word [word] map
    sorted_word word     = BS.pack . sort . BS.unpack $ word


But get this error:

anagram.hs:12:27:
    No instance for (Ord (BS.ByteString -> BS.ByteString))
      arising from a use of `Map.insertWith'' at anagram.hs:12:27-69
    Possible fix:
      add an instance declaration for
      (Ord (BS.ByteString -> BS.ByteString))
    In the expression: Map.insertWith' (++) sorted_word [word] map
    In the definition of `insert_word':
        insert_word map word = Map.insertWith' (++) sorted_word [word] map
    In the definition of `anagrams':
        anagrams words
                   = sorted_anagrams
                   where
                       sorted_anagrams = sortBy (flip $ comparing
length) get_anagrams
                       get_anagrams = Map.elems $ foldl' insert_word
Map.empty words
                       insert_word map word = Map.insertWith' (++)
sorted_word [word] map
                       sorted_word word = BS.pack . sort . BS.unpack $ word


is sorted_word returning a function?


More information about the Beginners mailing list