[Haskell-cafe] [Newbie] What to improve in my code

Daniel Fischer daniel.is.fischer at web.de
Tue Jul 13 18:13:24 EDT 2010


On Tuesday 13 July 2010 23:49:45, Frank1981 wrote:
> First of all: I'm not sure if this question is allowed here. If not, I
> apologize
>
> I'm trying to solve the following problem: For each word in a text find
> the number of occurences for each unique word in the text.
>
> i've come up with the following steps to solve this:
>  * remove all punctuation except for whitespace and make the text
> lowercase * find all unique words in the text
>  * for each unique word, count the number of occurences.
>
> This has resulted in the following code:
> removePunctuation :: [Char] -> [Char]
> removePunctuation str = filter (\c -> elem c (['a'..'z'] ++ ['A'..'Z']
> ++ ['\t', ' ', '\n'])) str

Depending on your criteria, maybe

import Data.Char

removePunctuation = filter (\c -> isAlpha c || isSpace c)

is better
>
>
> process :: [Char] -> [String]
> process str = words (map toLower (removePunctuation str))

Or perhaps

process = map (fiter isLower) . words . map toLower

>
> unique :: (Eq a) => [a] -> [a]
> unique [] = []
> unique (x:xs) = [x] ++ unique (filter (\s -> x /= s) xs)

import Data.List

unique = nub

but it's not particularly efficient.
If you don't need to keep the order of first occurrence and have an Ord 
instance, you could take

unique' = map head . group . sort

or

import qualified Data.Set as Set

unique'' = Set.toList . Set.fromList

>
> occurenceCount :: (Eq a) => a -> [a] -> Int
> occurenceCount _ [] = 0
> occurenceCount x (y:ys)
>
> 	| x == y = 1 + occurenceCount x ys
> 	| otherwise = occurenceCount x ys

occurrenceCount a xs = length (filter (== a) xs)

or

occurrenceCount a = length . filter (== a)

>
> occurenceCount' :: [String] -> [String] -> [(String, Int)]
> occurenceCount' [] _ = [("", 0)]

why not occurrenceCount' [] _ = [] ?

> occurenceCount' (u:us) xs = [(u, occurenceCount u xs)] ++
> occurenceCount' us xs

But it can be done shorter:

import qualified Data.Map as Map
import Data.List

occurrenceCount'' :: Ord a => [a] -> [(a,Int)]
occurrenceCount'' xs = Map.toList $ 
      foldl' (\mp x -> Map.insertWith' (+) x 1 mp) Map.empty xs

No need to get the unique elements up front.

>
> Please remember i've only been playing with Haskell for three afternoons
> now and i'm happy that the above code is working correctly.
>
> However i've got three questions:
> 1) occurenceCount' [] _ = [("", 0)] is plain ugly and also adds a
> useless tuple to the end result. Is there a better way to solve this?
> 2) I'm forcing elements into a singleton list on two occasions, both in
> my unique function and in my occurenceCount' function. Once again this
> seems ugly and I'm wondering if there is a better solution.

Use (:), e.g.

unique (x:xs) = x : unique (filter (/= x) xs)

> 3) The whole process as i'm doing it now feels pretty imperatively (been
> working for years as a Java / PHP programmer). I've got this feeling
> that the occurenceCount' function could be implemented using a mapping
> function. What ways are there to make this more "functional"?



More information about the Haskell-Cafe mailing list