[Haskell-beginners] problems using bytestrings
Daniel Fischer
daniel.is.fischer at web.de
Tue Jan 19 15:49:57 EST 2010
Am Dienstag 19 Januar 2010 20:40:38 schrieb Joe Van Dyk:
> Hello,
>
> I wrote a function that finds anagrams in a file. It worked great.
> Until I tried to use Bytestrings to get better performance.
>
> Here's the code:
> (http://github.com/joevandyk/haskell/blob/aa61a58e6a027dda60a32ae64ec99f
>92d00ae5ed/pearls/anagrams/anagram.hs) import qualified Data.Map as Map
> import Data.Ord
> import Data.List
> import qualified Data.ByteString.Char8 as BS
>
> -- what's the type here? I get an infinite type error
> anagrams :: Ord a => [a] -> [a]
anagrams :: Ord a => [[a]] -> [[[a]]]
But ask ghci, that's faster to answer such questions than the list :)
Don't give a type signature, load the module,
ghci> :t anagrams
anagrams :: Ord a => [[a]] -> [[[a]]]
> anagrams words =
> sorted_anagrams
> where
> sorted_anagrams = sortBy (flip $ comparing length) get_anagrams
> get_anagrams = Map.elems $ foldl' insert_word Map.empty
> words insert_word map word = Map.insertWith' (++) (sort word) [word] map
You sort each word from the input list, so word must have type
Ord a => [a],
thus the input has type
Ord a => [[a]].
The map you build maps Strings ([Char], generally, Ord a => [[a]]) to lists
of Strings (so Map String [String] === Map [Char] [[Char]], in general,
Ord a => Map [a] [[a]]), thus Map.elems gives a list of (lists of Strings),
that is [[String]] === [[[Char]]], in general, [[[a]]].
>
> main = do
> -- original code, worked fine:
> -- input <- getContents
> -- print $ take 3 $ anagrams $ lines input
> -- new code with bytestring has errors
> input <- BS.getContents
> print $ take 3 $ anagrams $ BS.lines input
>
>
> There are two errors. One:
> anagram.hs:7:0:
> Occurs check: cannot construct the infinite type: a = [a]
> When generalising the type(s) for `anagrams'
>
>
> That happens when I add the type to the anagrams method.
>
> If I don't have a type specified, then I get this error:
> anagram.hs:21:30:
> Couldn't match expected type `[a]'
> against inferred type `BS.ByteString'
> Expected type: [[a]]
> Inferred type: [BS.ByteString]
> In the second argument of `($)', namely `BS.lines input'
> In the second argument of `($)', namely `anagrams $ BS.lines input'
>
>
> I'm lost here. Help!
ByteStrings are not lists, so Data.List.sort can't do anything with them.
You could change one line of the the code of anagrams to
insert_word map word = Map.insertWith' (++) (BS.sort word) [word] map
which restricts the type of anagrams to
anagrams :: [ByteString] -> [[ByteString]]
But, ByteStrings sort isn't good for short ByteStrings (allocate an array
of 256 slots, count how often each character occurs, write in order to new
ByteString - the overhead of allocating the array is larger than the
sorting cost for short ByteStrings).
For ByteStrings as short as normal words in
English/French/German/Italian/Spanish, it's much better to unpack the
ByteStrings for sorting and change the line to
insert_word map word = Map.insertWith' (++)
(BS.pack . sort . BS.unpack $ word) [word] map
More information about the Beginners
mailing list