[Haskell-cafe] Software Tools in Haskell
gwern0 at gmail.com
Tue Dec 11 12:52:29 EST 2007
On 2007.12.10 13:52:41 -0600, Tommy McGuire <mcguire at crsr.net>
scribbled 1.7K characters:
> In the "if anyone is interested,..." department....
> For reasons that remain unclear, early this fall I started translating
> Brian W. Kernighan and P.J. Plaugher's classic _Software Tools in Pascal_
> into Haskell. I have completed most of it, up to the second part of
> chapter 8 which presents a proto-m4 preprocessor. I have the code online
> including notes, comments, descriptions, and a few alternate approaches.
> Attractions include:
> * A fair gamut of the usual Unix suspects: proto-cat, proto-wc, proto-tr,
> proto-compress, proto-ar, proto-grep, etc.
> * A usable editor, if you consider a de-featured ed-alike to be usable.
> * A simple monadic regular expression engine.
> * Zippers, Parsec, the State monad, the StateT monad transformer, and other
> attempts to sully Computing Science's brightest jewels.
> * Lots and lots of really bad Haskell, including a fair bit that is a
> direct translation of 30-year old Pascal (see xindex in translit, Ch. 2, if
> you need to skip lunch). Programming really has advanced, you know.
> Anyway, the URL is:
> Questions and comments would be appreciated, especially suggestions for how
> to make the code cleaner and more understandable. Flames and mockery are
> welcome, too, but only if they're funny---remember, I've been staring at
> Haskell, Pascal (plus my job-related Perl, CORBA, and C++) for a while;
> there's no telling what my mental state is like.
> [I had intended to wait until I had the whole thing done to make this
> announcement, but I recently moved and have not made much forward progress
> since, other than putting what I had done online.]
> Tommy M. McGuire
Some of those really look like they could be simpler, like 'copy' -
couldn't that simply be 'main = interact (id)'?
Have you seen <http://haskell.org/haskellwiki/Simple_Unix_tools>?
For example, 'charcount' could be a lot simpler - 'charcount = showln
. length' would work, wouldn't it, for the core logic, and the whole
thing might look like:
> main = do (print . showln . length) =<< getContents
Similarly wordcount could be a lot shorter, like 'wc_l = showln .
length . lines'
(showln is a convenience function: showln a = show a ++ "\n")
I... I want to provide a one-liner for 'detab', but it looks
impressively monstrous and I'm not sure I understand it.
One final comment: as regards run-length encoding, there's a really
neat way to do it. I wrote a little article on how to do it a while
ago, so I guess I'll just paste it in here. :)
Recently I was playing with and working on a clone of the old Gradius
arcade games which was written in Haskell, Monadius
Most of my changes were not particularly interesting (cleaning up,
Cabalizing, fixing warnings, switching all Integers to Ints and so
on), but in its Demo.hs, I found an interesting solution to an
interesting problem which seems to be a good example of how Haskell's
abstractions can really shine.
So, suppose we have these data items, which are levels which are
specified by a pair of numbers and then a long list of numbers, often
very repetitious. Perhaps a particular level might be represented this
> level1 = ((Int,Int),[Int])
This is clearly a bad way of representing things. We could just scrap
this representation as Ints completely and perhaps define it as
> -- Assume that these datatypes are already defined and everything
> level1 :: ((Geometry,Geometry),[Enemies])
> level1 = ((Tall,Narrow), > > [FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,FlyEnemy,Shooter,PowerUp,Boss..]) > -- and so on
But this representation, while certainly more symbolic, is still very
repetitious. We need some way of expressing this more concisely, of,
in short, *compressing* it. In this vein of thought, our first
observation should be that we do not need to resort to Gzipping it or
adding dependencies on fancy compression libraries or anything; there
is a very obvious way to compress it already - the entire thing is
practically just a series of repeated numbers. We should be able to
replace the length enumeration of [0,0,0,0,0,0,0,0,0,0,0,0] with
something simpler, say the number of repetition and what is to be
repeated so that our entry would look like (12,0). This representation
is definitely shorter and more importantly, easier to modify and not a
constant. It is possible that there may be a performance benefit here,
as we've gotten rid of a large constant that would have to be defined
in the program itself and instead replaced it with a shorter function
which evaluates to the same thing.
So, what is the type of our decompressing function? Well, we need to
turn a (Int,Int) into a [Int]; even better, we want to turn a whole
list of (Int,Int)s into a single list of [Int]. Thus, our end goal is
going to be a function of this type:
> rleDecode :: [(Int,Int)] -> [Int]
Let us tackle the single tuple example first. The second entry defines
what we need, and the first entry defines how many we need. We could
write a recursive function that takes the parameter, decreases the
first entry by one, and cons on one example of the second entry. It
could look like this:
> rleRecursiveDecode :: (Int,Int) -> [Int]
> rleRecursiveDecode (0,_) = 
> rleRecursiveDecode (n,b) = b : (rleRecursiveDecode (n-1,b))
But this really is not the best way to go. It is not necessarily easy
to follow, and if there is one thing I have learned about Haskell
programming, it is that the most obvious approach (in this case,
primitive recursion) may not be the best way to go. This is code that
could have as well been written in Scheme or something. It is
complicated because we are trying to ensure that we generate an item
for the list only at the exact moment we need to add it into the list;
we are programming as if our language is strict, in other words.
So, what is the lazy way of doing things? Infinite lists, of course.
We create an infinite list containing only the second entry, and we
merely take as many as the first entry says we need. Simple enough!
> -- Infinite list of our item of interest.
> reduplicate a = a:a
> rleLazyDecode :: (Int,Int) -> [Int]
> rleLazyDecode (n,b) = take n (reduplicate b)
Now, reduplicate is a simple enough function to define, but it already
has a definition in the standard libraries - 'cycle'. (I assume you
know what 'take' is, but that is also fairly easy to define once you
seen the need for it.) So:
> rleLazyDecode (n,b) = take n (cycle b)
Might as well remove the parentheses:
> rleLazyDecode (n,b) = take n $ cycle b
A satisfying, short, functional, and lazy one-liner. From here the
definition of rleDecode is almost trivial: we extend it to a list of
tuples by throwing in a map, and we turn the resulting list of lists
into a single list by way of 'concat':
> rleDecode ns = concat $ map rleLazyDecode ns
We can tweak this further, as 'concat . map' is a common enough idiom
that there is a shortcut:
> rleDecode ns = concatMap rleLazyDecode ns
Aw heck - make it points-free:
> rleDecode = concatMap rleLazyDecode
And substitute in the rleLazyDecode definition:
> rleDecode = concatMap (\(n,b) -> take n $ cycle b)
We could also write a version that omits the explicit lambda and
naming of parameters by use of the helpful but somewhat esoteric
'uncurry' function; Uncurry takes apart the tuple. Tts type is:
> uncurry :: (a -> b -> c) -> (a, b) -> c
We can actually go even further into the realms of
incomprehensibility. It turns out that lists are counter-intuitively a
monad! This means we can use bind and all the rest of the operations
defined by the Monad typeclass to operate on lists and other things.
So we can write the bizarre (but short and effective) version of
rleDecode that follows:
> rleDecode = (uncurry replicate =<<)
And we are done! We can now represent the first list like this:
> d1 = ((2,1),d)
> where d = rleDecode [(5, 3), (10, 67), (6, 3), (5, 67), (29, 3), (2, 11), (29, 3), (5, 35), (3, 3), (3, 35), (24, 3), (2, 35), (19, 3), (7, 19), (21, 51), (63, 35), (15, 43), (5, 11), (8, 43), (8, 35), (3, 3), (9, 35), (20, 3), (52, 67), (32, 3), (13, 35), (3, 3), (11, 35), (5, 3), (9, 35), (4, 3), (6, 35), (4, 3), (5, 35), (13, 3), (14, 19), (4, 3), (4, 19), (6, 3), (4, 19), (7, 3), (5, 19), (8, 3), (8, 19), (5, 83), (3, 67), (2, 3), (12, 19), (5, 83), (17, 19), (9, 83), (2, 19), (5, 3), (20, 67), (1, 75), (38, 11), (1, 43), (4, 35), (7, 3), (57, 67), (2, 3), (5, 19), (17, 3), (6, 19), (8, 3), (6, 67), (7, 83), (59, 3), (9, 19), (1, 51), (17, 35), (20, 3), (14, 35), (7, 3), (2, 35), (11, 43), (6, 11), (7, 3), (26, 67), (7, 3), (13, 35), (5, 43), (7, 11), (2, 3), (5, 67), (1, 83), (2, 19), (2, 3), (10, 67), (21, 3), (147, 0)]
Much nicer, don't you think? And the best part is, we should be able
to reuse this run-length decoding even if we replace the arbitrary
numbers by more descriptive type constructors!
(Many thanks to the good denizens of #haskell, and Don Stewart's blog
entry on run-length encoding/decoding using Arrows
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 188 bytes
Desc: not available
Url : http://www.haskell.org/pipermail/haskell-cafe/attachments/20071211/03940148/attachment.bin
More information about the Haskell-Cafe