[Haskell-beginners] Comments on Map/Reduce Code

Brent Yorgey byorgey at seas.upenn.edu
Tue Jul 10 16:19:48 CEST 2012


On Thu, Jul 05, 2012 at 03:50:32PM +0200, Thomas Bach wrote:
> Hello there,

Hi Thomas,

Looks pretty good.  I've interspersed a few comments below.


> And here is the code I've come up with:
> 
> vince at roku:~/tmp cat mapper.hs
> import qualified Data.ByteString.Lazy.Char8 as C
> 
> postFix :: C.ByteString
> postFix = C.pack "\t1"
> 
> formatter :: C.ByteString -> C.ByteString
> formatter x = C.append x postFix
> 
> main :: IO ()
> main = do
>   contents <- fmap C.words C.getContents
>   C.putStr . C.unlines $ map formatter contents

The above looks fine, except that generally the recommendation is to
use the text package [1] for dealing with text, whereas ByteString is
for binary data that you wish to manipulate as a sequence of bytes.
You can get away with the above only when the text consists entirely
of ASCII characters.

[1] http://hackage.haskell.org/package/text

> vince at roku:~/tmp cat reducer.hs
> import qualified Data.ByteString.Lazy.Char8 as C
> import qualified Data.List as L
> 
> tuppleize :: String -> (String, Int)
> tuppleize line = (\xs -> (head xs, read (last xs))) $ words line

What happens when the line is empty?

> group :: Eq a => [(a, b)] -> [[(a, b)]]
> group = L.groupBy (\x y -> fst x == fst y)

The above lambda can also be written as ((==) `on` fst).  'on' can be
imported from Data.Function.

> 
> summation :: Num b => [(a, b)] -> (a, b)
> summation (x:[]) = x
> summation (x:xs) = (fst x, (snd x) + (snd (summation xs)))

Instead of using (fst x) and (snd x), you should pattern-match on x,
like

  summation ((x,y):xs) = (x, y + ...)

> formatter :: (String, Int) -> String
> formatter = (\w -> (fst w ++ "\t" ++ show (snd w)))

The same goes here.  I would also put the w argument on the left-hand
side of the =, like

  formatter (s,i) = s ++ "\t" ++ show i

> 
> main = do
>   contents <- C.getContents
>   putStr . unlines $ map formatter $ map summation $ group $ map tuppleize $ lines $ C.unpack contents

Instead of using a chain of ($), it's generally considered better
style to use a chain of (.) with a single $ at the end, like

  putStr . unlines . map formatter ... lines . C.unpack $ contents

-Brent



More information about the Beginners mailing list