[Haskell-beginners] Seeking some clarification...

Daniel Fischer daniel.is.fischer at googlemail.com
Sun Jun 5 21:35:07 CEST 2011


On Sonntag, 5. Juni 2011, 20:46, Sean Charles wrote:
> I have a CSV file containing some data about trains: stations from, to,
> times etc. and I wanted to 'learn some more Haskell' and, to my
> astonishment,
> I have gotten thus far but I am not sure *why* it works or *how* I got
> there! LMAO Here is the relevant code ...
> 
> ====>
> trains :: String -> IO ()
> trains csvfile = do
>       legodata <- parseCSVFromFile csvfile
>       case legodata of
>           Left error -> print error
>           Right legodata -> mapM_ putStrLn (trainCodes legodata)
> 
> -- Assumes row 1 contains cell header information
> -- Note: the train-code is always the third cell
> 
> trainCodes :: [Record] -> [String]
> trainCodes = nub . map (!! 2) . tail

That'll bomb of course on malformed input, but that's probably okay in this 
scenario.

> ====>
> 
> I was chuffed with writing the trainCodes as a point-free function,
> that sort of thing is getting a little easier to work with but I still
> have real head-banging frustrations sometimes with seemingly simple
> things, like looping and just printing stuff out, despite having taught
> myself
> LISP six years ago and Erlang in recent years! I quit!! I really do!!!
> 
> My confusion arises over: mapM_ putStrLn (traincodes legodata)
> Given that: mapM_ :: Monad
> <http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Control-
> Monad.html#t%3AMonad> m => (a -> m b) -> [a] -> m ()
> <http://haskell.org/ghc/docs/6.12.2/html/libraries/ghc-prim-0.2.0.0/GHC-
> Unit.html#t%3A%28%29>
> 
> Here's how I read it: For any m that is a Monad, mapM_ takes a function
> that "takes
> an 'a' and returns it wrapped in a monad",

Not it, but a value based on it (the mapM_'ed function has type (a -> m b))
"A value wrapped in a monad" is kind of a skewed picture, doesn't really do 
justice to State or Cont for example.

mapM_ is the composition of

map :: (a -> c) -> [a] -> [c],

restricted to types c = m b for some Monad m -
that part produces a list [m b], then - and

sequence_ :: (Monad m) => [m b] -> m ()

sequence_ "runs" all the actions in the list and discards their results.
If you want to collect the results, there's

sequence :: (Monad m) => [m b] -> m [b]

and the composition of sequence and map,

mapM :: (Monad m) => (a -> m b) -> [a] -> m [b]

> a "list of a's" and returns a
> "monad containing
> 'unit'", the empty-list LISP-() undefined voidy thing.
> 
> Given that: putStrLn :: String
> <http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelud
> e.html#t:String> -> IO
> <http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelud
> e.html#t:IO> ()
> <http://hackage.haskell.org/packages/archive/base/latest/doc/ghc-prim-0.
> 2.0.0/GHC-Unit.html#t:-40--41->, this means that 'a' is String and 'm b'
> is
> IO () and my list of [a] is the result of calling 'traincodes legodata'.

Right.

> 
> trainCodes = nub . map (!! 2) . tail
> 
> legodata is [Record] (from Text.CSV) and so, 'tail' removes the header
> row from
> the data, 'map (!! 2)' extracts the third  field from each row and
> finally 'nub'
> removes the duplicates. Thus the return type from trainCodes is
> [String].

Yup.

> 
> Gluing it all together:
> 
> mapM_ :: Monad
> <http://haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Control-
> Monad.html#t%3AMonad> m => (a -> m b) -> [a] -> m ()
> <http://haskell.org/ghc/docs/6.12.2/html/libraries/ghc-prim-0.2.0.0/GHC-
> Unit.html#t%3A%28%29> putStrLn :: String
> <http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelud
> e.html#t:String> -> IO
> <http://hackage.haskell.org/packages/archive/base/latest/doc/html/Prelud
> e.html#t:IO> ()
> <http://hackage.haskell.org/packages/archive/base/latest/doc/ghc-prim-0.
> 2.0.0/GHC-Unit.html#t:-40--41-> trainCodes :: [Record] -> [String]
> 
> the type of my call then would seem to be:
> 
> String -> IO () -> [String] -> IO ()

Missing parentheses, it's (String -> IO ()) -> [String] -> IO ()

> "putStrLn" -> (trainCodes legodata) -> IO ()
> 
> which means that not only have I got the types correct for the call but
> the result
> type of 'IO ()' also satisfies the type return for my function and hence
> it executes
> from 'main' (where it is called from) with no issues.
> 
> So, am I finally beginning to get a grip on it all ?

Looks quite so.

> This list is a constant source
> of education and I don't post very often as you guys give me far too
> much stuff to be reading all the time! :)
> 
> I am using Text.CSV to read my file and all I wanted to do was to output
> a list of unique codes from column three of the spreadsheet data, one
> per line, so that I can use this Haskell as part of a bigger 'bash'
> script.

And that's what your code does :)

> 
> Any detailed explanations that might help me better understand my
> solution would be welcome; right now I feel I 'just got lucky' although
> there must be a glimmer of understanding somewhere! LOL

ToDos:
1. parse file to get a list of rows -- parseCSVFromFile
2. remove header row    -- tail
3. extract the field(s) of interest -- (!! 2) for one, map (!! 2) for the 
list
4. remove duplicates -- nub
5. output -- mapM_ putStrLn

1. is delegated to a library function, how that works need not concern us 
at the moment

2. should be clear

3. also clear

4. library, you need not care how it does what it does (unless performance 
becomes an issue; nub is O(n^2), if that's too slow, you have to use faster 
variants exploiting that in your case you have more than the Eq constraint 
nub can only work with; an Ord constraint gives easy O(n*log n) 
implementations [using Data.Set, for example]; in a few special cases O(n) 
is possible)

5. putStrLn is clear, for mapM_ see above

1. and 5. involve IO (reading a file resp. printing to stdout), 2., 3. and 
4. operate only on data, so those steps can be combined into a pipeline 
like you did.

> 
> Thanks,
> Sean.
> 
> PS: Phew!



More information about the Beginners mailing list