[Haskell-cafe] Doubting Haskell

Fri Feb 22 07:22:37 EST 2008

A quick note here.  This is a *really* excellent tutorial on a variety  
of subjects.  It shows how monad operators can be used responsibly (to  
clarify code, not obfuscate it), it shows how chosing a good data  
structure and a good algorithm can work wonders for your code, and on  
a simplistic level, it shows how to build a database in Haskell.

Would it be possible to clean this up and put it in the wiki somewhere?

Thanks

Bob

On 20 Feb 2008, at 09:58, Cale Gibbard wrote:

> (I'm copying the list on this, since my reply contains a tutorial
> which might be of use to other beginners.)
>
> On 19/02/2008, Alan Carter <alangcarter at gmail.com> wrote:
>> Hi Cale,
>>
>> On Feb 19, 2008 3:48 PM, Cale Gibbard <cgibbard at gmail.com> wrote:
>>> Just checking up, since you haven't replied on the list. Was my
>>> information useful? Did I miss any questions you might have had? If
>>> you'd like, I posted some examples of using catch here:
>>
>> Thanks for your enquiry! My experiment continues. I did put a  
>> progress
>> report on the list - your examples together with a similar long an
>> short pair got me over the file opening problem, and taught me some
>> things about active whitespace :-) I couldn't get withFile working
>> (says out of scope, maybe 'cos I'm ghc 6.6 on my Mac)
>
> Make sure to put:
>
> import System.IO
>
> at the top of your source file, if you haven't been. This should
> import everything documented here:
> http://www.haskell.org/ghc/docs/latest/html/libraries/base/System-IO.html
>
>> but it turned out the line I was looking for (collapsed from the  
>> examples)
>> was:
>>
>>  text <- readFile "data.txt" `catch` \_ -> return ""
>>
>> This ensures the program never loses control, crashing or becoming
>> unpredictable by attempting to use an invalid resource, by yielding  
>> an
>> empty String if for any reason the file read fails. Then an empty
>> String makes it very quickly through parsing. I guess that's quite
>> "functiony" :-)
>>
>> Amazing how easy once I knew how. Even stranger that I couldn't  
>> find a
>> "bread and butter" example of it.
>>
>> Then I was going very quickly for a while. My file is dumped from a
>> WordPress MySql table. Well formed lines have 4 tab separated fields
>> (I'm using pipes for tabs here):
>>
>> line id | record id | property | value
>>
>> Line IDs are unique and don't matter. All lines with the same record
>> ID give a value to a property in the same record, similar to this:
>>
>> 1|1|name|arthur
>> 2|1|quest|seek holy grail
>> 3|1|colour|blue
>> 4|2|name|robin
>> 5|2|quest|run away
>> 6|2|colour|yellow
>>
>> Organizing that was a joy. It took minutes:
>
> let cutUp = tail (filter (\fields -> (length fields) == 4)
>                                      (map (\x -> split x '\t')  
> (lines text)))
>
> This should almost certainly be a function of text:
>
> cutUp text = tail (filter (\fields -> (length fields) == 4)
>                                 (map (\x -> split x '\t') (lines  
> text)))
>
>> I found a split on someone's blog (looking for a library tokenizer),
>> but I can understand it just fine. I even get to chuck out ill-formed
>> lines and remove the very first (which contains MySql column names)  
>> on
>> the way through!
>
> Sadly, there's no general library function for doing this. We have
> words and lines (and words would work here, if your fields never have
> spaces), but nobody's bothered to put anything more general for simple
> splitting into the base libraries (though I'm sure there's plenty on
> hackage -- MissingH has a Data.String.Utils module which contains
> split and a bunch of others, for example). However, for anything more
> complicated, there are also libraries like Parsec, which are generally
> really effective, so I highly recommend looking at that at some point.
>
>> I then made a record to put things in, and wrote some lines to play
>> with it (these are the real property names):
>>
>> data Entry = Entry
>>  { occupation                :: String
>>  , iEnjoyMyJob               :: Int
>>  , myJobIsWellDefined        :: Int
>>  , myCoworkersAreCooperative :: Int
>>  , myWorkplaceIsStressful    :: Int
>>  , myJobIsStressful          :: Int
>>  , moraleIsGoodWhereIWork    :: Int
>>  , iGetFrustratedAtWork      :: Int
>>  }
>> ...
>>  let e = Entry{occupation = "", iEnjoyMyJob = 0}
>>  let f = e {occupation = "alan"}
>>  let g = f {iEnjoyMyJob = 47}
>>  putStrLn ((occupation g) ++ " " ++ (show (iEnjoyMyJob g)))
>>
>> Then I ran into another quagmire. I think I have to use Data.Map to
>> build a collection of records keyed by record id, and fill them in by
>> working through the list of 4 item lists called cutUp. As with the
>> file opening problem I can find a few examples that convert a list of
>> tuples to a Data.Map, one to one. I found a very complex example that
>> convinced me a map from Int to a record is possible, but gave me no
>> understanding of how to do it. I spent a while trying to use foldl
>> before I decided it can't be appropriate (I need to pass more  
>> values).
>> So I tried a couple of recursive functions, something like:
>>
>> type Entries = M.Map Int Entry
>> ...
>>  let entries = loadEntries cutUp
>> ...
>> loadEntries :: [[String]] -> Entries
>> loadEntries [] = M.empty Entries
>> loadEntries [x : xs] = loadEntry (loadEntries xs) x
> -- Possible common beginner error here: [x:xs] means the list with one
> element which is a list whose first element is x and whose tail is xs.
> Your type signature and the type of cutUp seems to confirm that this
> is the right type, but you don't seem to have a case to handle a
> longer list of lists. If you want just a list with first entry x, and
> with tail xs, that's just (x:xs). If you want to handle lists of lists
> recursively, you'll generally need two cases: ([]:xss) and
> ((x:xs):xss). We'll end up doing something different instead of
> recursion in a moment.
>>
>> loadEntry entries _ rid fld val = entries
>>
>> Trying to create an empty map at the bottom of the recursion so later
>> I can try to fiddle about checking if the key is present and  
>> crating a
>> new record otherwise, then updating the record with a changed one (a
>> big case would be needed deep in to do each property update). If I'm
>> on the right track it's not good enough to get better, so now I'm  
>> just
>> throwing bits of forest animals into the pot at random again :-(
>>
>> So I certainly would be grateful for a clue! The bits I can do (I got
>> a non-trivial wxHaskell frame sorted out quite easily, the tokenizing
>> and record bit were OK) I think show I'm not *totally* stupid at  
>> this,
>> I'm putting loads of time investment in (it's an experiement in
>> itself) but there do seem to be certain specific things that would be
>> ubiquitous patterns in any production or scripting environment, which
>> are not discussed at all and far from obvious. The more I see of
>> Haskell the more I suspect this issue is the gating one for popular
>> uptake.
>>
>> I couldn't help thinking of this bit, from the Wikipedia entry on the
>> Cocteau Twins:
>>
>> "The band's seventh LP, Four-Calendar Café, was released in late  
>> 1993.
>> It was a departure from the heavily-processed, complex and layered
>> sounds of Blue Bell Knoll and Heaven or Las Vegas, featuring clearer
>> and more minimalistic arrangements. This, along with the record's
>> unusually comprehensible lyrics, led to mixed reviews for the album:
>> Some critics accused the group of selling out and producing an
>> 'accessible album,' while others praised the new direction as a
>> felicitous development worthy of comparison with Heaven or Las  
>> Vegas."
>>
>> Best wishes,
>>
>> Alan
>
> I woke up rather early, and haven't much to do, so I'll turn this into
> a tutorial. :)
>
> Okay. The most common ways to build a map are by using the fromList,
> fromListWith, or fromListWithKey functions. You can see them in the
> documentation here:
>
> http://www.haskell.org/ghc/docs/latest/html/libraries/containers/Data-Map.html#v%3AfromList
>
> The types are:
>
> fromList :: (Ord k) => [(k,a)] -> Map k a
>
> fromListWith :: (Ord k) => (a -> a -> a) -> [(k,a)] -> Map k a
>
> fromListWithKey :: (Ord k) => (k -> a -> a -> a) -> [(k,a)] -> Map k a
>
> They take a list of (key,value) pairs, and build a map from it.
> Additionally, the fromListWith function takes a function which
> specifies how the values should be combined if their keys collide.
> There is also a fromListWithKey function which allows the means of
> combination to depend on the key as well.
>
> At this point we realise something interesting about the way the data
> is being represented: if there is a field in someone's record with no
> row in the database, what should the resulting field contain? In C,
> they often use some integer which is out of range, like -1 for this.
>
> How about for a missing occupation field? Well, that's a String, you
> could use some generic failure string, or an empty string, but I'll
> show you another possibility that just might be convenient.
>
> If t is any type, then the type (Maybe t) consists of the values
> Nothing, and Just x, whenever x is a value of type t. This is another
> convenient way to represent the idea that a computation might fail.
>
> Let's start by changing your record type so that each field is a Maybe
> value, that is, either the value Nothing, or the value Just x, where x
> is the value it would have been.
>
> data Entry = Entry
>  { occupation                :: Maybe String
>  , iEnjoyMyJob               :: Maybe Int
>  , myJobIsWellDefined        :: Maybe Int
>  , myCoworkersAreCooperative :: Maybe Int
>  , myWorkplaceIsStressful    :: Maybe Int
>  , myJobIsStressful          :: Maybe Int
>  , moraleIsGoodWhereIWork    :: Maybe Int
>  , iGetFrustratedAtWork      :: Maybe Int
>  }
>
> There's a very general function in the module Control.Monad which I'd
> like to use just for the Maybe type here. It's called mplus, and for
> Maybe, it works like this:
>
> mplus (Just x) _ = Just x
> mplus Nothing  y = y
>
> So if the first parameter isn't Nothing, that's what you get,
> otherwise, you get the second parameter. Of course, this operation has
> an identity element which is Nothing.
>
> So this lets you combine partial information expressed by Maybe types,
> in a left-biased way.
>
> It's about to become obvious that record types are less convenient
> than perhaps they could be in Haskell, and this is absolutely true --
> I'd actually probably use a somewhat different representation myself
> (possibly something involving a Map from Strings (field names) to Int
> values), but I can't really be sure what you intend with this data,
> and how much type safety you want.
>
> I'll elide the field names just because I can here. It's not
> necessarily good style.
>
> combine :: Entry -> Entry -> Entry
> combine (Entry a1 a2 a3 a4 a5 a6 a7 a8) (Entry b1 b2 b3 b4 b5 b6 b7  
> b8)
>    = Entry (a1 `mplus` b1) (a2 `mplus` b2) (a3 `mplus` b3) (a4  
> `mplus` b4)
>            (a5 `mplus` b5) (a6 `mplus` b6) (a7 `mplus` b7) (a8  
> `mplus` b8)
>
> Even with all the shorthand, this is pretty ugly (and I'll show how
> I'd represent the data in a moment), but what this does is to combine
> two partial entries, favouring the information in the
> first, but filling the holes in the first with data from the second.
> This operation has an identity element, which is:
>
> emptyEntry = Entry Nothing Nothing Nothing Nothing Nothing Nothing
> Nothing Nothing
>
> Let's try a different representation, which is a little more flexible,
> but expresses less in the type system.
>
> data Entry = Entry { occupation :: Maybe String, survey :: M.Map  
> String Int }
>   deriving (Eq, Ord, Show)
>
> So now, instead of a bunch of separate Maybe Int fields, we have just
> one Map from String to Int. If we don't have information for a field,
> we simply won't have that key in the Map. Of course, this means we'll
> have to use strings for field labels. If that seems unhappy, you could
> always define a type like:
>
> data SurveyQuestion = IEnjoyMyJob
>                    | MyJobIsWellDefined
>                    | MyCoworkersAreCooperative
>                    | MyWorkplaceIsStressful
>                    | MyJobIsStressful
>                    | MoraleIsGoodWhereIWork
>                    | IGetFrustratedAtWork
>    deriving (Eq, Ord, Show)
>
> to be used in place of the String type.
>
> Let's see how combine will look now:
>
> combine :: Entry -> Entry -> Entry
> combine (Entry o1 s1) (Entry o2 s2) = Entry (o1 `mplus` o2) (s1  
> `M.union` s2)
>
> Or, using the record syntax more:
>
> combine :: Entry -> Entry -> Entry
> combine e1 e2 = Entry { occupation = (occupation e1 `mplus`  
> occupation e2),
>                        survey = (survey e1 `M.union` survey e2) }
>
> Again, this new version has an identity with respect to combine,  
> which is:
>
> emptyEntry = Entry {occupation = Nothing, survey = (M.empty)}
>
> Now, we just need a way to take one of your rows, and turn it into a
> (key,value) pair, where the value is a partial entry.
>
> readRow :: [String] -> (Int, Entry)
> readRow [n, k, "occupation", v] = (read k, emptyEntry { occupation =  
> Just v })
> readRow [n, k, f, v] = (read k, emptyEntry { survey = M.singleton f  
> (read v) })
> readRow xs = error "failure case, should never happen!"
>
> There is actually a failure case that I'm not handling here, which is
> what happens when the value or key fails to parse as an Int. For that
> we'd use reads instead of read, but let's ignore it for now.
>
> We can then map this function over our cut up rows, something along
> the lines of:
>
> map readRow (cutUp text)
>
> at which point we'll have a list of (Int, Entry) pairs.
>
> We then want to fill up our Entries Map with those, and we want to
> combine them as we go using the combine function:
>
> entryMap text = M.fromListWith combine (map readRow (cutUp text))
>
> Some final changes we could consider would be putting more of the
> error handling into readRow itself: if it was to return a singleton
> Map rather than an (Int, Entry) pair, then it could return the empty
> Map on failure, and the results would then be combined using the
> function M.unionsWith combine. We could move the length 4 test out of
> cutUp then, and just make it the fall-through case in readRow. I'll
> also use reads, which returns a list of (parse,rest-of-string) pairs,
> to handle the failure cases where the numbers don't parse, by just
> treating those rows as nonexistent:
>
> readRow :: [String] -> M.Map Int Entry
> readRow [n, k, f, v] =
>   case reads k of
>      []       -> M.empty -- the key didn't parse
>      (k',_):_ ->
>          if f == "occupation"
>             then M.singleton k' (emptyEntry { occupation = Just v })
>             else case reads v of
>                    []       -> M.empty -- the value didn't parse
>                    (v',_):_ -> M.singleton k'
>                                    (emptyEntry { survey =  
> M.singleton f v' })
> readRow xs = M.empty -- this handles the case when the list is any  
> length but 4
>
> cutUp text = tail (map (\x -> split x '\t') (lines text)) -- which
> allows cutUp to be simpler
>
> entryMap text = M.unionsWith combine (map readRow (cutUp text))
>
> Anyway, I hope this tutorial gives some idea of how things progress,
> and what sort of thinking is generally involved. Note that the focus
> here was more on finding the right combining operations, and then
> using pre-existing higher-order functions to collapse the structure,
> than it was on recursion.