[Haskell-beginners] Comments on Map/Reduce Code
Brent Yorgey
byorgey at seas.upenn.edu
Tue Jul 10 16:19:48 CEST 2012
On Thu, Jul 05, 2012 at 03:50:32PM +0200, Thomas Bach wrote:
> Hello there,
Hi Thomas,
Looks pretty good. I've interspersed a few comments below.
> And here is the code I've come up with:
>
> vince at roku:~/tmp cat mapper.hs
> import qualified Data.ByteString.Lazy.Char8 as C
>
> postFix :: C.ByteString
> postFix = C.pack "\t1"
>
> formatter :: C.ByteString -> C.ByteString
> formatter x = C.append x postFix
>
> main :: IO ()
> main = do
> contents <- fmap C.words C.getContents
> C.putStr . C.unlines $ map formatter contents
The above looks fine, except that generally the recommendation is to
use the text package [1] for dealing with text, whereas ByteString is
for binary data that you wish to manipulate as a sequence of bytes.
You can get away with the above only when the text consists entirely
of ASCII characters.
[1] http://hackage.haskell.org/package/text
> vince at roku:~/tmp cat reducer.hs
> import qualified Data.ByteString.Lazy.Char8 as C
> import qualified Data.List as L
>
> tuppleize :: String -> (String, Int)
> tuppleize line = (\xs -> (head xs, read (last xs))) $ words line
What happens when the line is empty?
> group :: Eq a => [(a, b)] -> [[(a, b)]]
> group = L.groupBy (\x y -> fst x == fst y)
The above lambda can also be written as ((==) `on` fst). 'on' can be
imported from Data.Function.
>
> summation :: Num b => [(a, b)] -> (a, b)
> summation (x:[]) = x
> summation (x:xs) = (fst x, (snd x) + (snd (summation xs)))
Instead of using (fst x) and (snd x), you should pattern-match on x,
like
summation ((x,y):xs) = (x, y + ...)
> formatter :: (String, Int) -> String
> formatter = (\w -> (fst w ++ "\t" ++ show (snd w)))
The same goes here. I would also put the w argument on the left-hand
side of the =, like
formatter (s,i) = s ++ "\t" ++ show i
>
> main = do
> contents <- C.getContents
> putStr . unlines $ map formatter $ map summation $ group $ map tuppleize $ lines $ C.unpack contents
Instead of using a chain of ($), it's generally considered better
style to use a chain of (.) with a single $ at the end, like
putStr . unlines . map formatter ... lines . C.unpack $ contents
-Brent
More information about the Beginners
mailing list