[Haskell-cafe] Please critique my code (a simple lexer)

John Simon zildjohn01 at gmail.com
Tue May 22 17:13:07 CEST 2012


Hi all,

I've been teaching myself Haskell lately (I come from the C#/Python
world). I wrote a simplistic lexer, and I was hoping I could get a
code review or two. The code that follows is a stand-alone app that
works under GHC.

A few concerns of mine:
- My `consume` function seems basic enough that it should be library
code, but my searches turned up empty. Did I miss anything?
- Is `case _ of x:xs -> x:xsr where xsr = something xs` a common
idiom? It happened twice in my code, and it seems odd to split the
first element away from the rest of the list as it's processed.
- Is creating data structures with simple field names like `kind`,
`offset`, etc a good practice? Since the names are global functions, I
worry about namespace pollution, or stomping on functions defined
elsewhere.

Thanks in advance for anyone willing to take the time.

-- code follows

module Main where

import qualified Data.Map as Map

data Lexer = Lexer String

makeLexer :: String -> Lexer
makeLexer fn = Lexer fn

data Loc = Loc {offset :: Int, line :: Int, column :: Int}

locInc loc n = Loc (offset loc + n) (line loc) (column loc + n)
locNL loc = Loc (offset loc + 1) (line loc + 1) 1

data TokenKind = Colon | RArrow1 | Def | Var | Identifier String | EOF
deriving Show

data Token = Token {lexer :: Lexer, loc :: Loc, kind :: TokenKind}

idStart = ['a'..'z'] ++ ['A'..'Z'] ++ "!@$%^&*-_=+|<>/?"
idNext = idStart ++ ['0'..'9'] ++ "'\""

namedTokens = Map.fromList [
    ("def", Def),
    ("var", Var)]

doLex :: Lexer -> String -> [Token]
doLex lexer = doLex' lexer (Loc 0 1 1)

doLex' lexer loc source = case source of
    []         -> [makeToken EOF]
    ' ':xs     -> more (locInc loc 1) xs
    '\n':xs    -> more (locNL loc) xs
    ':':xs     -> makeToken Colon : more (locInc loc 1) xs
    '-':'>':xs -> makeToken RArrow1 : more (locInc loc 2) xs
    x:xs | x `elem` idStart ->
        makeToken kind : more (locInc loc $ length name) xsr
        where (namer, xsr) = consume idNext xs
              name = x:namer
              kind = maybe (Identifier name) id $ Map.lookup name namedTokens
    _ -> error "Invalid character in source"
    where
        makeToken = Token lexer loc
        more = doLex' lexer

consume :: Eq a => [a] -> [a] -> ([a], [a])
consume want xs = case xs of
    x:xs | x `elem` want -> (x:xsr, rest) where (xsr, rest) = consume want xs
    _                    -> ([], xs)

main :: IO ()
main = do
    let toks = doLex (makeLexer "") "def x -> y" in
        putStrLn $ show $ map kind toks



More information about the Haskell-Cafe mailing list