[Haskell-cafe] develop new Haskell shell?

Thu May 11 20:46:36 EDT 2006

At Thu, 11 May 2006 23:05:14 +0100,
Brian Hulley wrote:

> Of course the above could no doubt be improved but surely it is already far 
> easier to understand and much more powerful than the idiosyncratic text 
> based approach used in UNIX shells (including rc). 

The idea of representing unix pipes as monads has been around for a
while -- but what most people fail to account for is that many (most?)
real-world shell scripts also need to deal with return values and
stderr. Even standard unix shells are pretty terrible in this regard
-- so if we could do it *better* than standard shells -- that could be
pretty compelling.

Here are some simple examples of things to handle, starting with
failures in a pipeline:

 $ aoeu | cat -n ; echo $?
 bash: aoeu: command not found
 0
 $

Sweet! A successful return code even though there is clearly a
failure. Bash 3.x *finally* added, set -o pipefail -- which would
cause the above to return an error. Unfortunately, there is no way to
tell which part of the pipeline failed, or any way to attempt recovery
of the part that failed.

Often times, a program is run for its return code, not its output:

if /usr/bin/test -f /etc/motd ; then 
	echo "you have an /etc/motd" ; 
fi

And, there are also times when you want to do something with output,
and something else with the return code. 

if cat -n /etc/motd > /tmp/numbered ; then 
	echo "you have an /etc/motd" ; 
fi

This is tricky because many programs will not terminate until you have
consumed all the output. Because of haskell's laziness, it is very
easy to deadlock. Shell is also pretty weak in this regard, you can
either get the return code or the output of a command, but you have to
go through gyrations to get both. For example,

 $ echo `cat aoeu` ; echo $?
cat: aoeu: No such file or directory

0
 $

How do I check that `cat aoeu` has returned successfully? I think you
have to use an intermediate file:

if cat aoeu > /tmp/tmpfile ; then
 echo "An error occurred reading aoeu"
fi

<do something with output saved in /tmp/tmpfile>

rm /tmp/tmpfile

The above code is actually a really bad idea because /tmp/tmpfile may
already exist -- so it needs to be modified to use mktemp -- which
further complicates the code.

It is also, unfortunately, pretty difficult to create useful type
signatures for unix commands. For most, the best you can do is:

app :: [Flag] -> String -> IO String

Consider `cat', It may appear at first that it has the type:

cat :: [Flag] -> a -> a

but then you realize that many of the flags affect the output in
interesting ways. For example, '-n' numbers all the lines. If you did:

cat [Flag "-n"] someXmlFile

you surely will not get out valid xml data.

So, I think that, in general, calling external programs from haskell
is an inherently ugly and messy thing. It seems like you either end up
with something that is *clean* but less powerful than shell in many
respects, or something powerful, but ugly. Hopefully I am wrong, but
that has been my experience.

Some Ideas
----------

IMO, the real problem is, grep, find, etc, should have all been
libraries with optional command-line interfaces. Then we could just
have FFI bindings and write normal looking haskell code.

It may still be a good idea to take the top 20 unix utils and code
them as native haskell functions and see how far that goes. I know
there are some existing libraries that deal with basic stuff like mv,
etc. Has anyone implemented grep, find, etc?

I think that the problem calling programs and trying to check there
return code and use there output is that you are trying to wire up two
different things:

 (1) the connecting of inputs and outputs
 (2) the flow control that results for the return values

This problem looks a bit like the GUI problem where you have to
describe the layout of the widgets on the screen and describe the flow
of events from one widget to another. So there may be some ideas from
GUI research that can be applied to the scripting stuff.

It would also be wise to look at occam and erlang and see if they have
any useful ideas. And, of course, Windows PowerShell.

j.