[Haskell-cafe] [Newbie] What to improve in my code
daniel.is.fischer at web.de
Tue Jul 13 18:13:24 EDT 2010
On Tuesday 13 July 2010 23:49:45, Frank1981 wrote:
> First of all: I'm not sure if this question is allowed here. If not, I
> I'm trying to solve the following problem: For each word in a text find
> the number of occurences for each unique word in the text.
> i've come up with the following steps to solve this:
> * remove all punctuation except for whitespace and make the text
> lowercase * find all unique words in the text
> * for each unique word, count the number of occurences.
> This has resulted in the following code:
> removePunctuation :: [Char] -> [Char]
> removePunctuation str = filter (\c -> elem c (['a'..'z'] ++ ['A'..'Z']
> ++ ['\t', ' ', '\n'])) str
Depending on your criteria, maybe
removePunctuation = filter (\c -> isAlpha c || isSpace c)
> process :: [Char] -> [String]
> process str = words (map toLower (removePunctuation str))
process = map (fiter isLower) . words . map toLower
> unique :: (Eq a) => [a] -> [a]
> unique  = 
> unique (x:xs) = [x] ++ unique (filter (\s -> x /= s) xs)
unique = nub
but it's not particularly efficient.
If you don't need to keep the order of first occurrence and have an Ord
instance, you could take
unique' = map head . group . sort
import qualified Data.Set as Set
unique'' = Set.toList . Set.fromList
> occurenceCount :: (Eq a) => a -> [a] -> Int
> occurenceCount _  = 0
> occurenceCount x (y:ys)
> | x == y = 1 + occurenceCount x ys
> | otherwise = occurenceCount x ys
occurrenceCount a xs = length (filter (== a) xs)
occurrenceCount a = length . filter (== a)
> occurenceCount' :: [String] -> [String] -> [(String, Int)]
> occurenceCount'  _ = [("", 0)]
why not occurrenceCount'  _ =  ?
> occurenceCount' (u:us) xs = [(u, occurenceCount u xs)] ++
> occurenceCount' us xs
But it can be done shorter:
import qualified Data.Map as Map
occurrenceCount'' :: Ord a => [a] -> [(a,Int)]
occurrenceCount'' xs = Map.toList $
foldl' (\mp x -> Map.insertWith' (+) x 1 mp) Map.empty xs
No need to get the unique elements up front.
> Please remember i've only been playing with Haskell for three afternoons
> now and i'm happy that the above code is working correctly.
> However i've got three questions:
> 1) occurenceCount'  _ = [("", 0)] is plain ugly and also adds a
> useless tuple to the end result. Is there a better way to solve this?
> 2) I'm forcing elements into a singleton list on two occasions, both in
> my unique function and in my occurenceCount' function. Once again this
> seems ugly and I'm wondering if there is a better solution.
Use (:), e.g.
unique (x:xs) = x : unique (filter (/= x) xs)
> 3) The whole process as i'm doing it now feels pretty imperatively (been
> working for years as a Java / PHP programmer). I've got this feeling
> that the occurenceCount' function could be implemented using a mapping
> function. What ways are there to make this more "functional"?
More information about the Haskell-Cafe