[Haskell-beginners] problems using bytestrings
Joe Van Dyk
joe at fixieconsulting.com
Tue Jan 19 16:34:41 EST 2010
On Tue, Jan 19, 2010 at 12:49 PM, Daniel Fischer
<daniel.is.fischer at web.de> wrote:
> Am Dienstag 19 Januar 2010 20:40:38 schrieb Joe Van Dyk:
>> Hello,
>>
>> I wrote a function that finds anagrams in a file. It worked great.
>> Until I tried to use Bytestrings to get better performance.
>>
>> Here's the code:
>> (http://github.com/joevandyk/haskell/blob/aa61a58e6a027dda60a32ae64ec99f
>>92d00ae5ed/pearls/anagrams/anagram.hs) import qualified Data.Map as Map
>> import Data.Ord
>> import Data.List
>> import qualified Data.ByteString.Char8 as BS
>>
>> -- what's the type here? I get an infinite type error
>> anagrams :: Ord a => [a] -> [a]
>
> anagrams :: Ord a => [[a]] -> [[[a]]]
>
> But ask ghci, that's faster to answer such questions than the list :)
>
> Don't give a type signature, load the module,
>
> ghci> :t anagrams
> anagrams :: Ord a => [[a]] -> [[[a]]]
>
>> anagrams words =
>> sorted_anagrams
>> where
>> sorted_anagrams = sortBy (flip $ comparing length) get_anagrams
>> get_anagrams = Map.elems $ foldl' insert_word Map.empty
>> words insert_word map word = Map.insertWith' (++) (sort word) [word] map
>
> You sort each word from the input list, so word must have type
>
> Ord a => [a],
>
> thus the input has type
>
> Ord a => [[a]].
>
> The map you build maps Strings ([Char], generally, Ord a => [[a]]) to lists
> of Strings (so Map String [String] === Map [Char] [[Char]], in general,
> Ord a => Map [a] [[a]]), thus Map.elems gives a list of (lists of Strings),
> that is [[String]] === [[[Char]]], in general, [[[a]]].
>
>>
>> main = do
>> -- original code, worked fine:
>> -- input <- getContents
>> -- print $ take 3 $ anagrams $ lines input
>> -- new code with bytestring has errors
>> input <- BS.getContents
>> print $ take 3 $ anagrams $ BS.lines input
>>
>>
>> There are two errors. One:
>> anagram.hs:7:0:
>> Occurs check: cannot construct the infinite type: a = [a]
>> When generalising the type(s) for `anagrams'
>>
>>
>> That happens when I add the type to the anagrams method.
>>
>> If I don't have a type specified, then I get this error:
>> anagram.hs:21:30:
>> Couldn't match expected type `[a]'
>> against inferred type `BS.ByteString'
>> Expected type: [[a]]
>> Inferred type: [BS.ByteString]
>> In the second argument of `($)', namely `BS.lines input'
>> In the second argument of `($)', namely `anagrams $ BS.lines input'
>>
>>
>> I'm lost here. Help!
>
> ByteStrings are not lists, so Data.List.sort can't do anything with them.
> You could change one line of the the code of anagrams to
>
> insert_word map word = Map.insertWith' (++) (BS.sort word) [word] map
>
> which restricts the type of anagrams to
>
> anagrams :: [ByteString] -> [[ByteString]]
>
> But, ByteStrings sort isn't good for short ByteStrings (allocate an array
> of 256 slots, count how often each character occurs, write in order to new
> ByteString - the overhead of allocating the array is larger than the
> sorting cost for short ByteStrings).
>
> For ByteStrings as short as normal words in
> English/French/German/Italian/Spanish, it's much better to unpack the
> ByteStrings for sorting and change the line to
>
> insert_word map word = Map.insertWith' (++)
> (BS.pack . sort . BS.unpack $ word) [word] map
Thanks for the response. I was getting confused -- I'm used to
thinking about a String as being its own type, instead of being an
array of chars.
I now have:
anagrams :: [BS.ByteString] -> [[BS.ByteString]]
anagrams words =
sorted_anagrams
where
sorted_anagrams = sortBy (flip $ comparing length) get_anagrams
get_anagrams = Map.elems $ foldl' insert_word Map.empty words
insert_word map word = Map.insertWith' (++) sorted_word [word] map
sorted_word word = BS.pack . sort . BS.unpack $ word
But get this error:
anagram.hs:12:27:
No instance for (Ord (BS.ByteString -> BS.ByteString))
arising from a use of `Map.insertWith'' at anagram.hs:12:27-69
Possible fix:
add an instance declaration for
(Ord (BS.ByteString -> BS.ByteString))
In the expression: Map.insertWith' (++) sorted_word [word] map
In the definition of `insert_word':
insert_word map word = Map.insertWith' (++) sorted_word [word] map
In the definition of `anagrams':
anagrams words
= sorted_anagrams
where
sorted_anagrams = sortBy (flip $ comparing
length) get_anagrams
get_anagrams = Map.elems $ foldl' insert_word
Map.empty words
insert_word map word = Map.insertWith' (++)
sorted_word [word] map
sorted_word word = BS.pack . sort . BS.unpack $ word
is sorted_word returning a function?
More information about the Beginners
mailing list