[Haskell-cafe] |> vs. $ (was: request for code review)

Wed Mar 8 06:01:08 EST 2006

Am Dienstag, 7. März 2006 20:52 schrieb Shannon -jj Behrens:
> I did think of using a monad, but being relatively new to Haskell, I
> was confused about a few things.  Let's start by looking at one of my
> simpler functions:
>
> -- Keep pushing tokens until we hit an identifier.
> pushUntilIdentifier :: ParseContextTransformation
> pushUntilIdentifier ctx
>
>   | currTokType ctx == Identifier = ctx
>   | otherwise =
>
>       let newStack = (currTok ctx) : (stack ctx) in
>         (ctx {stack=newStack}) |>
>         getToken |>
>         pushUntilIdentifier
>
> The function itself is a ParseContextTransformation.  It takes a
> context, transforms it, and returns it.  Most of the pipelines in the
> whole application are ParseContextTransformations, and the |> (or $ or
> .) are ways of tying them together.  My questions concerning Monads
> are in this example are:
>
> 1. Monads apply a strategy to computation.  For instance, the list
> monad applies the strategy, "Try it with each of my members."  What
> part of my code is the strategy?
>
> 2. Monads are containers that wrap a value.  For instance, the Maybe
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
Some are, others embody computations that produce a value, yet others: ?

> monad can wrap any value, or it can wrap no value and just be Nothing.
>  What part of my code is the thing being wrapped, and what part is
> "extra data" stored in the Monad itself?
>
> So I guess:
>
> 3. Is the ParseContext the monad or the thing being wrapped?
>
> 4. How do I divide the code between the functions on the right side of
>
> >>= and the functions in the monad itself?  The functions on the right
>
> side of >>= operate on the stuff inside the monad, and the functions
> in the monad itself operate on the stuff in the monad.
>
> 5. How does the ParseContextTransformation relate?
>
> It is because I did not understand the answers to these questions that
> I thought maybe a monad might not be appropriate.  However, I surely
> could be wrong.  Afterall, ParseContext, ParseContextTransformation,
> and |> are all *inspired* by what I knew about monads.
>
> Thanks for your help!
>
> -jj

I'd use a State-monad, say 

import Control.Monad.State

type CDParser a = State ParseContext a
 -- or perhaps StateT ParseContext m a, where m is an appropriate monad,
 -- I haven't thought much about it

then you'd have e.g.

pushUntilIdentifier :: CDParser ()
pushUntilIdentifier = do
	tt <- gets currTokType
	case tt of
		Identifier -> return ()
		_ -> do
			pushToken
			getToken
			pushUntilIdentifier

okay, that doesn't look really better, but if you'd done it monadically from 
the start, you'd probably chosen a different design (I think, I'd leave the 
current token out of the ParseContext and have that returned by the 
appropriate actions). Alternatively, you could use Parsec with Parsecontext 
as user state (removing the input from ParseContext) and take advantage of 
the many provided combinators in Parsec.
As another method, I've hacked up a translation by parsing a declaration and 
creating a customized Show-instance. It could be much improved, but for a 
quick hack, I can live with it.

-- | Translate C-declarations to english, well, sort of
module Translate where

import Text.ParserCombinators.ReadPrec
import qualified Text.ParserCombinators.ReadP as P
import Text.Read
import Data.Char (isAlpha, isAlphaNum)

-- lift some operators from ReadP to ReadPrec, indicates that I
-- should have originally worked with ReadP and lifted to
-- ReadPrec afterwards.
spaces = lift P.skipSpaces

string = lift . P.string

char = lift . P.char

many p = lift $ P.many $ readPrec_to_P p 0

-- | list of known types, struct, union and enum don't really
--   belong here, but since C is inherently sick, it doesn't matter
typeNames :: [String]
typeNames = [ "void", "char", "signed", "unsigned", "short", "int"
            , "long", "float", "double", "struct", "union", "enum"]

-- | may this Char appear in a C-identifier?
isIdLetter :: Char -> Bool
isIdLetter c = c == '_' || isAlphaNum c

-- | may this Char begin a C-identifier?
isIdStart :: Char -> Bool
isIdStart c = c == '_' || isAlpha c

-- | the sort of types, we can handle
data CType
    = Basic String -- ^ plain types like int, char...
    | Const CType  -- ^ type with "const"
    | Ptr   CType  -- ^ pointer to type
    | Array [Maybe Int] CType -- ^ Array with dimensions

-- | type synonym to check whether a variable is volatile
type Volatile = Bool

-- | the declarations we can parse, due to C's horrible syntax,
--   we can't handle multiple variable declarations like
--
-- > int *a, b[5], c;
--
--   but the original programme couldn't either.
data Decl
    = VarDecl CType Volatile String
    | FunDecl CType String [CType]

----------------------------------------------------------------------
--                          Show Instances                          --
----------------------------------------------------------------------

-- here we translate the declaration to english
instance Show CType where
    showsPrec _ (Basic nm)
        = showString nm
    showsPrec _ (Const ty)
        = showString "read-only " . shows ty
    showsPrec _ (Ptr ty)
        = showString "pointer to " . shows ty
    showsPrec _ (Array dims ty)
        = showD dims . shows ty
          where
            showD [] = id
            showD (Just n:ds)
                = showString "array 0.." . shows (n-1) . showString " of "
                    . showD ds
            showD (Nothing:ds)
                = showString "array of " . showD ds

instance Show Decl where
    showsPrec _ (VarDecl ty vol nm)
        = showString nm . showString " is a " . showV vol .
            showString "variable of type " . shows ty
          where
            showV True  = showString "volatile "
            showV False = id
    showsPrec _ (FunDecl rty nm atys)
        = showString nm . showString " is a function of " .
            showArgs atys . showString ", returning " . shows rty
          where
            showArgs []
                = showString "no arguments"
            showArgs [t]
                = showString "one argument of type " . shows t
            showArgs tys@(t:ts)
                = shows (length tys) . showString " arguments of types " .
                    shows t . showRest ts
            showRest [t] = showString " and " . shows t
            showRest (t:ts) = showString ", " . shows t . showRest ts

----------------------------------------------------------------------
--                             Parsing                              --
----------------------------------------------------------------------

lexeme :: String -> ReadPrec String
lexeme str = do
    spaces
    string str
    rst <- look
    case rst of
        (c:_) | isIdLetter c -> pfail
        _   -> return str

parseIdentifier :: ReadPrec String
parseIdentifier = lift $ do
    P.skipSpaces
    c <- P.satisfy isIdStart
    cs <- P.many (P.satisfy isIdLetter)
    return (c:cs)

parseBasic :: ReadPrec CType
parseBasic = do
    tynam <- choice $ map lexeme typeNames
    return (Basic tynam)

parseNoArray :: ReadPrec CType
parseNoArray = (do
    ty <- parsePType
    lexeme "const"
    return (Const ty)) <++ parsePType

parseConstT :: ReadPrec CType
parseConstT = do
    lexeme "const"
    bs <- parseBasic
    return (Const bs)

parsePrePtr :: ReadPrec CType
parsePrePtr = parseConstT <++ parseBasic

parsePType :: ReadPrec CType
parsePType = do
    ty <- parsePrePtr
    complete ty
      where
        complete t
            = (do
            spaces
            char '*'
            complete (Ptr t)) <++ return t

parseDim :: ReadPrec (Maybe Int)
parseDim = (do
    spaces
    char '['
    n <- readPrec
    spaces
    char ']'
    return (Just n)) <++ (spaces >> string "[]" >> return Nothing)

parseVarDecl :: ReadPrec Decl
parseVarDecl = do
    ty <- parseNoArray
    vol <- (lexeme "volatile" >> return True) <++ return False
    nam <- parseIdentifier
    ds <- many parseDim
    spaces
    char ',' <++ char ';' <++ escape
    let t = if null ds then ty else Array ds ty
    return (VarDecl t vol nam)
      where
        escape = do
            rst <- look
            case rst of
              (')':_) -> return '.'
              _ -> pfail

parseFunDecl :: ReadPrec Decl
parseFunDecl = do
    ty <- parsePType
    nam <- parseIdentifier
    ats <- parseFuncArgs
    spaces
    char ';' <++ char '{'
    return (FunDecl ty nam ats)

parseFuncArgs :: ReadPrec [CType]
parseFuncArgs = do
    char '('
    vds <- many parseVarDecl
    char ')'
    return (map typ vds)
      where
        typ (VarDecl ty _ _) = ty

instance Read Decl where
    readPrec = parseFunDecl <++ parseVarDecl

readDecl :: String -> Decl
readDecl = read

translate :: String -> String
translate = show . readDecl
--------------------------------------------------------------------------

It's better than the original for some things:
*CDecl> translate "const int * const a;"
"a is read-only"
*CDecl> :l Translate
Compiling Translate        ( Translate.hs, interpreted )
Ok, modules loaded: Translate.
*Translate> translate "const int * const a;"
"a is a variable of type read-only pointer to read-only int"

but
*CDecl> translate "union {int a, char b};"
"a is int"
*Translate> translate "union {int a, char b};"
"*** Exception: Prelude.read: no parse

Neither is really convincing.

Cheers,
Daniel

-- 

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
	-- Blair P. Houghton