[Haskell-cafe] Comments and suggestions on code

Jonathan Cast jonathanccast at fastmail.fm
Thu Jan 10 00:32:10 EST 2008


On 9 Jan 2008, at 7:57 PM, Andre Nathan wrote:

> Hello
>
> I've just found time to finish writing my first "real world"  
> program, so
> I thought I'd post it here and ask for insight on general issues  
> such as
> if there's anything that isn't done "the Haskell way", or if there's
> something that could be done more efficiently.
>
> The code is at the bottom of this message and also at
> http://hpaste.org/4893. I realize it's a bit long, so if anyone could
> just skim through it and see if there's anything really ugly or stupid
> and point it out, it would be of great help :)
>
> Just to make it easier to follow the code, its idea is simple:
>
> - Build a process tree by reading entries from /proc (represented as a
>   map);
> - Compare each child of the init process against a whitelist (which
>   comes from a configuration file);
> - For each child not in the whitelist, send it a KILL signal.
>
> The idea here is to run this on webservers and try to catch bad
> customers who try to run daemons from their accounts, the typical  
> script
> kiddie stuff.
>
> Anyway, there's one specific question I'd like to ask. I'm using  
> "StateT
> PsTree IO" to keep program state (the process tree). IO is necessary
> because all information is read from /proc files. Now consider the
> following function:
>
> appendChild :: Pid -> Pid -> StateT PsTree IO Bool
> appendChild ppid pid = do
>   tree <- get
>   let PsInfo psData children = mapLookup ppid tree
>   put $ Map.insert ppid (PsInfo psData (pid:children)) tree
>   return True

A return type of Bool suggests the code might fail; a constant  
function should have return type ().

>
> It changes the program state by modifying a process tree entry, but it
> does no I/O at all. The return type is there basically to match the
> return type of the function which calls it (insertParent), which calls
> functions that do I/O. Is there anyway to avoid the "IO" in
> appendChild's signature (other than making it a pure function by  
> passing
> the process tree as a parameter and returning a modified map)?

This is the best solution, as well as the most idiomatic.  It's  
really simple, too:

appendChild :: Pid -> Pid -> PsTree -> PsTree
appendChild ppid pid tree = Map.insert ppid (PsInfo psData  
(pid:children)) tree
   where
     PsInfo psData children = mapLookup ppid tree

Which is two lines shorter than your version, and IMHO just as clear;  
or, even better

appendChild ppid pid =
   Map.alter (fmap $ \ (PsInfo psData children) -> PsInfo psData  
(pid:children)) ppid

which is a one-liner.

Alternatively, you could keep the definition, but change the type to

appendChild :: Monad m => Pid -> Pid -> StateT PsTree m Bool

or

appendChild :: MonadState m PsTree => Pid -> Pid -> m Bool

although this is likely to be less efficient.

> I would also like to try ways to improve efficiency, maybe trying a  
> hash
> table instead of a map for the state, and also using bytestrings. I
> guess I could try making it parallel, since each child of init can be
> checked independently.
>
> Anyway, this is already longer than I thought it would be (I hope I'm
> not abusing too much :)

An actual coding question, abuse?  We should be so lucky.

> The code follows. Thanks in advance for any
> comments or suggestions.
>
> Andre
>
>
> module Main where
>
> import qualified Data.Map as Map

Also

import Data.Map (Map)

(Map.Map looks kind of silly).

> import Directory
> import Control.Monad.State
> import Maybe
> import System.Environment
> import System.IO
> import System.Posix.Files
> import System.Posix.Signals
> import System.Posix.Unistd
> import System.Posix.User
> import Text.Printf
> import Text.Regex
> import Text.Regex.Posix
>
> type Pid = FilePath
> type Uid = String
>
> type PsData       = Map.Map String String
> type PsChildren   = [Pid]
> type KillFunction = PsTree -> Pid -> IO ()

> data PsInfo = PsInfo PsData PsChildren

This sequence is better written

data PsInfo = PsInfo{
   psData :: Map String String,
   psChildren :: [Pid]
}

If find myself using typedefs relatively infrequently in Haskell.

> type PsTree = Map.Map Pid PsInfo
>
> type Whitelist = Map.Map FilePath String
>
> mapLookup :: (Ord a) => a -> Map.Map a b -> b
> mapLookup k = fromJust . Map.lookup k
>
> -- Process Tree construction
>
> parentPid :: PsInfo -> Pid
> parentPid (PsInfo psData _) = mapLookup "PPid" psData
>
> getProcInfo :: String -> PsData -> PsData
> getProcInfo line psData = do
>   case line =~~ "^([A-Za-z]+):[[:space:]]+(.*)$" of
>     Nothing                  -> psData
>     Just ([_, key, value]:_) -> Map.insert key value psData

> getIds :: String -> PsData -> (String, String)
> getIds id psData = (rId, eId)
>   where (rId:eId:_) = words (mapLookup id psData)
>
> processData :: String -> PsData
> processData procData = addIds psData
>   where psData = foldr getProcInfo Map.empty (lines procData)
>         addIds psData = Map.union psData (idMap psData)
>         idMap psData = Map.fromList [("RUid", rUid), ("EUid", eUid),
>                                      ("RGid", rGid), ("EGid", eGid)]
>         (rUid, eUid) = getIds "Uid" psData
>         (rGid, eGid) = getIds "Gid" psData
>
> readLink :: String -> IO String
> readLink link = catch (readSymbolicLink link) (\e -> return "?")
>
> procInfo :: Pid -> IO PsInfo
> procInfo pid = do
>   let dir = "/proc/" ++ pid ++ "/"
>   procData <- readFile $ dir ++ "status"
>   exe <- readLink $ dir ++ "exe"
>   cwd <- readLink $ dir ++ "cwd"
>   cmd <- readFile $ dir ++ "cmdline"
>   let cmd' = subRegex (mkRegex "[^a-zA-z[:space:]\\/\\.-]") cmd " "
>       info = processData procData
>       adminInfo = Map.fromList [("Exe", exe), ("Cwd", cwd),
>                                 ("Cmd", cmd')]
>   return $ PsInfo (Map.union info adminInfo) []
>
> addProc :: Pid -> StateT PsTree IO PsInfo
> addProc pid = do
>   info <- lift $ procInfo pid
>   modify (Map.insert pid info)
>   return info
>
> appendChild :: Pid -> Pid -> StateT PsTree IO Bool
> appendChild ppid pid = do
>   tree <- get
>   let PsInfo psData children = mapLookup ppid tree
>   put $ Map.insert ppid (PsInfo psData (pid:children)) tree
>   return True

As above, or

appendChild ppid pid =
   Map.alter (fmap $ \ st -> st {children = pid : children st}) ppid

with the record syntax.

>
> insertParent :: Pid -> Pid -> StateT PsTree IO Bool
> insertParent ppid pid = do
>   tree <- get
>   if Map.member ppid tree
>     then appendChild ppid pid

modify (appendChild ppid pid)

>     else do
>       built <- insertInTree ppid
>       if built
>         then appendChild ppid pid
>         else return False
>
> insertPid :: Pid -> StateT PsTree IO Bool
> insertPid "1" = do
>   info <- addProc "1"
>   return True
> insertPid pid = do
>   info <- addProc pid
>   let ppid = parentPid info
>   if ppid == "0" then return False else insertParent ppid pid
>
> insertInTree :: Pid -> StateT PsTree IO Bool
> insertInTree pid = do
>   tree <- get
>   if Map.member pid tree then return True else insertPid pid
>
> buildTree :: FilePath -> StateT PsTree IO Bool
> buildTree entry | entry =~ "^[0-9]+$" = insertInTree entry
>                 | otherwise           = return False

This function is fairly complicated, simply because of the number of  
separate definitions involved; I would be looking for opportunities  
to inline definitions here, so it's clearer what the definitions  
are.  (Also, I would try to build a single, self-recursive function  
at the top level, put the call to procInfo there, and make everything  
else pure).

>
> createTree :: IO PsTree
> createTree = do
>   entries <- getDirectoryContents "/proc"
>   execStateT (mapM_ buildTree entries) Map.empty
>
> -- Process Tree pretty-printing
>
> treeStr :: PsTree -> PsChildren -> Int -> String -> String
> treeStr tree children level str = foldr append str children
>   where append pid s = treeStr tree children' (level + 1) newstr
>           where PsInfo _ children' = mapLookup pid tree
>                 pad = take (4 * level) [' ', ' ' ..]
>                 newstr = s ++ "\n" ++ pad ++ pid
>
> printTree :: PsTree -> Pid -> IO ()
> printTree tree pid = putStrLn (treeStr tree children 1 pid)
>   where PsInfo _ children = mapLookup pid tree
>
> -- Process killing
>
> tryToKill :: PsTree -> KillFunction -> Pid -> Uid -> Uid -> Bool ->
>              IO Bool
> tryToKill tree killFun pid "0" allowedUid killed = do
>   -- The process may be starting, give it a second chance.
>   sleep 10 -- conservative value.
>   let PsInfo psData _ = mapLookup pid tree
>   if allowedUid /= mapLookup "EUid" psData
>     then killFun tree pid >> return True
>     else return (killed || False)
> tryToKill tree killFun pid _ _ _ = killFun tree pid >> return True
>
> buildWhitelist :: String -> Whitelist -> Whitelist
> buildWhitelist line whitelist = do
>   case line =~~ "^[ \t]*([^: \t]+)[ \t]*:[ \t]*([^ \t]+)[ \t]*$" of
>     Nothing                 -> error "Invalid configuration file"
>     Just ([_, exe, user]:_) -> Map.insert exe user whitelist
>
> readWhiteList :: FilePath -> IO Whitelist
> readWhiteList file = do
>   contents <- readFile file
>   return $ foldr buildWhitelist Map.empty (lines contents)
>
> allowedUidForExecutable :: Whitelist -> FilePath -> IO Uid
> allowedUidForExecutable whitelist exe = do
>   case Map.lookup exe whitelist of
>     Nothing   -> return "0"
>     Just user -> do
>       entry <- getUserEntryForName user
>       return $ show (userID entry)
>
> processBastard :: PsTree -> Whitelist -> KillFunction -> Bool ->
>                   Pid -> IO Bool
> processBastard tree whitelist killFun killed pid = do
>   let PsInfo psData _ = mapLookup pid tree
>       euid = mapLookup "EUid" psData
>       exe = mapLookup "Exe" psData
>   allowedUid <- allowedUidForExecutable whitelist exe
>   if euid /= allowedUid
>     then tryToKill tree killFun pid euid allowedUid killed
>     else return killed
>
> withEachBastard :: PsTree -> Whitelist -> KillFunction -> IO Bool
> withEachBastard tree whitelist killFun =
>   foldM (processBastard tree whitelist killFun) False children
>   where (PsInfo _ children) = mapLookup "1" tree
>
> printWarnings :: Pid -> PsData -> IO ()
> printWarnings pid psData = do
>   let exe = mapLookup "Exe" psData
>   let cmd = mapLookup "Cmd" psData
>   let cwd = mapLookup "Cwd" psData
>   let ppid = mapLookup "PPid" psData
>   let euid = mapLookup "EUid" psData
>   let ruid = mapLookup "RUid" psData
>   let egid = mapLookup "EGid" psData
>   if ruid /= euid
>     then hPrintf stderr "PID %s: RUID=%s, EUID=%s\n" pid ruid euid
>     else return ()
>   hPrintf stderr "Killing proc %s (%s, UID=%s, GID=%s), child of %s\n"
>           pid exe euid egid ppid
>   hPrintf stderr "  Process command line: %s\n" cmd
>   hPrintf stderr "  Process working directory: %s\n" cwd
>
> killTree :: KillFunction
> killTree tree pid = do
>   let PsInfo psData children = mapLookup pid tree
>   printWarnings pid psData
>   signalProcess sigKILL (read pid)
>   mapM_ (killTree tree) children
>
> killBastards :: PsTree -> Whitelist -> Int -> IO ()
> killBastards tree whitelist n = do
>   runAgain <- withEachBastard tree whitelist killTree
>   if runAgain && n > 1
>     then do
>       sleep 2
>       killBastards tree whitelist (n - 1)
>     else
>       return ()
>
> -- Configuration
>
> printConfig :: KillFunction
> printConfig tree pid = do
>   let PsInfo psData _ = mapLookup pid tree
>       exe = mapLookup "Exe" psData
>       euid = mapLookup "EUid" psData
>   entry <- getUserEntryForID (read euid)
>   putStrLn $ exe ++ ": " ++ (userName entry)

I wouldn't call this a KillFunction; in fact, I would probably just  
inline the definition of KillFunction throughout.  An expression that  
has to be decoded is better than a name that is misleading.

>
> -- main helpers
>
> config :: PsTree -> IO ()
> config tree = do
>   withEachBastard tree Map.empty printConfig
>   return ()
>
> pstree :: PsTree -> IO ()
> pstree tree = do
>   printTree tree "1"
>   -- In newer kernels, process 2 is kthreadd, which is not a child
>   -- of init.
>   let info = mapLookup "2" tree
>   if parentPid info /= "1" then printTree tree "2" else return ()
>
> kill :: PsTree -> IO ()
> kill tree = do
>   whitelist <- readWhiteList "killbastards.conf"
>   killBastards tree whitelist 5
>
> main :: IO ()
> main = do
>   args <- getArgs
>   tree <- createTree
>   case args of
>     ["config"] -> config tree
>     ["pstree"] -> pstree tree
>     []         -> kill tree
>


More information about the Haskell-Cafe mailing list