[Haskell-cafe] Please critique my code (a simple lexer)
John Simon
zildjohn01 at gmail.com
Tue May 22 17:13:07 CEST 2012
Hi all,
I've been teaching myself Haskell lately (I come from the C#/Python
world). I wrote a simplistic lexer, and I was hoping I could get a
code review or two. The code that follows is a stand-alone app that
works under GHC.
A few concerns of mine:
- My `consume` function seems basic enough that it should be library
code, but my searches turned up empty. Did I miss anything?
- Is `case _ of x:xs -> x:xsr where xsr = something xs` a common
idiom? It happened twice in my code, and it seems odd to split the
first element away from the rest of the list as it's processed.
- Is creating data structures with simple field names like `kind`,
`offset`, etc a good practice? Since the names are global functions, I
worry about namespace pollution, or stomping on functions defined
elsewhere.
Thanks in advance for anyone willing to take the time.
-- code follows
module Main where
import qualified Data.Map as Map
data Lexer = Lexer String
makeLexer :: String -> Lexer
makeLexer fn = Lexer fn
data Loc = Loc {offset :: Int, line :: Int, column :: Int}
locInc loc n = Loc (offset loc + n) (line loc) (column loc + n)
locNL loc = Loc (offset loc + 1) (line loc + 1) 1
data TokenKind = Colon | RArrow1 | Def | Var | Identifier String | EOF
deriving Show
data Token = Token {lexer :: Lexer, loc :: Loc, kind :: TokenKind}
idStart = ['a'..'z'] ++ ['A'..'Z'] ++ "!@$%^&*-_=+|<>/?"
idNext = idStart ++ ['0'..'9'] ++ "'\""
namedTokens = Map.fromList [
("def", Def),
("var", Var)]
doLex :: Lexer -> String -> [Token]
doLex lexer = doLex' lexer (Loc 0 1 1)
doLex' lexer loc source = case source of
[] -> [makeToken EOF]
' ':xs -> more (locInc loc 1) xs
'\n':xs -> more (locNL loc) xs
':':xs -> makeToken Colon : more (locInc loc 1) xs
'-':'>':xs -> makeToken RArrow1 : more (locInc loc 2) xs
x:xs | x `elem` idStart ->
makeToken kind : more (locInc loc $ length name) xsr
where (namer, xsr) = consume idNext xs
name = x:namer
kind = maybe (Identifier name) id $ Map.lookup name namedTokens
_ -> error "Invalid character in source"
where
makeToken = Token lexer loc
more = doLex' lexer
consume :: Eq a => [a] -> [a] -> ([a], [a])
consume want xs = case xs of
x:xs | x `elem` want -> (x:xsr, rest) where (xsr, rest) = consume want xs
_ -> ([], xs)
main :: IO ()
main = do
let toks = doLex (makeLexer "") "def x -> y" in
putStrLn $ show $ map kind toks
More information about the Haskell-Cafe
mailing list