[Haskell-cafe] Comments and suggestions on code
Jonathan Cast
jonathanccast at fastmail.fm
Thu Jan 10 00:32:10 EST 2008
On 9 Jan 2008, at 7:57 PM, Andre Nathan wrote:
> Hello
>
> I've just found time to finish writing my first "real world"
> program, so
> I thought I'd post it here and ask for insight on general issues
> such as
> if there's anything that isn't done "the Haskell way", or if there's
> something that could be done more efficiently.
>
> The code is at the bottom of this message and also at
> http://hpaste.org/4893. I realize it's a bit long, so if anyone could
> just skim through it and see if there's anything really ugly or stupid
> and point it out, it would be of great help :)
>
> Just to make it easier to follow the code, its idea is simple:
>
> - Build a process tree by reading entries from /proc (represented as a
> map);
> - Compare each child of the init process against a whitelist (which
> comes from a configuration file);
> - For each child not in the whitelist, send it a KILL signal.
>
> The idea here is to run this on webservers and try to catch bad
> customers who try to run daemons from their accounts, the typical
> script
> kiddie stuff.
>
> Anyway, there's one specific question I'd like to ask. I'm using
> "StateT
> PsTree IO" to keep program state (the process tree). IO is necessary
> because all information is read from /proc files. Now consider the
> following function:
>
> appendChild :: Pid -> Pid -> StateT PsTree IO Bool
> appendChild ppid pid = do
> tree <- get
> let PsInfo psData children = mapLookup ppid tree
> put $ Map.insert ppid (PsInfo psData (pid:children)) tree
> return True
A return type of Bool suggests the code might fail; a constant
function should have return type ().
>
> It changes the program state by modifying a process tree entry, but it
> does no I/O at all. The return type is there basically to match the
> return type of the function which calls it (insertParent), which calls
> functions that do I/O. Is there anyway to avoid the "IO" in
> appendChild's signature (other than making it a pure function by
> passing
> the process tree as a parameter and returning a modified map)?
This is the best solution, as well as the most idiomatic. It's
really simple, too:
appendChild :: Pid -> Pid -> PsTree -> PsTree
appendChild ppid pid tree = Map.insert ppid (PsInfo psData
(pid:children)) tree
where
PsInfo psData children = mapLookup ppid tree
Which is two lines shorter than your version, and IMHO just as clear;
or, even better
appendChild ppid pid =
Map.alter (fmap $ \ (PsInfo psData children) -> PsInfo psData
(pid:children)) ppid
which is a one-liner.
Alternatively, you could keep the definition, but change the type to
appendChild :: Monad m => Pid -> Pid -> StateT PsTree m Bool
or
appendChild :: MonadState m PsTree => Pid -> Pid -> m Bool
although this is likely to be less efficient.
> I would also like to try ways to improve efficiency, maybe trying a
> hash
> table instead of a map for the state, and also using bytestrings. I
> guess I could try making it parallel, since each child of init can be
> checked independently.
>
> Anyway, this is already longer than I thought it would be (I hope I'm
> not abusing too much :)
An actual coding question, abuse? We should be so lucky.
> The code follows. Thanks in advance for any
> comments or suggestions.
>
> Andre
>
>
> module Main where
>
> import qualified Data.Map as Map
Also
import Data.Map (Map)
(Map.Map looks kind of silly).
> import Directory
> import Control.Monad.State
> import Maybe
> import System.Environment
> import System.IO
> import System.Posix.Files
> import System.Posix.Signals
> import System.Posix.Unistd
> import System.Posix.User
> import Text.Printf
> import Text.Regex
> import Text.Regex.Posix
>
> type Pid = FilePath
> type Uid = String
>
> type PsData = Map.Map String String
> type PsChildren = [Pid]
> type KillFunction = PsTree -> Pid -> IO ()
> data PsInfo = PsInfo PsData PsChildren
This sequence is better written
data PsInfo = PsInfo{
psData :: Map String String,
psChildren :: [Pid]
}
If find myself using typedefs relatively infrequently in Haskell.
> type PsTree = Map.Map Pid PsInfo
>
> type Whitelist = Map.Map FilePath String
>
> mapLookup :: (Ord a) => a -> Map.Map a b -> b
> mapLookup k = fromJust . Map.lookup k
>
> -- Process Tree construction
>
> parentPid :: PsInfo -> Pid
> parentPid (PsInfo psData _) = mapLookup "PPid" psData
>
> getProcInfo :: String -> PsData -> PsData
> getProcInfo line psData = do
> case line =~~ "^([A-Za-z]+):[[:space:]]+(.*)$" of
> Nothing -> psData
> Just ([_, key, value]:_) -> Map.insert key value psData
> getIds :: String -> PsData -> (String, String)
> getIds id psData = (rId, eId)
> where (rId:eId:_) = words (mapLookup id psData)
>
> processData :: String -> PsData
> processData procData = addIds psData
> where psData = foldr getProcInfo Map.empty (lines procData)
> addIds psData = Map.union psData (idMap psData)
> idMap psData = Map.fromList [("RUid", rUid), ("EUid", eUid),
> ("RGid", rGid), ("EGid", eGid)]
> (rUid, eUid) = getIds "Uid" psData
> (rGid, eGid) = getIds "Gid" psData
>
> readLink :: String -> IO String
> readLink link = catch (readSymbolicLink link) (\e -> return "?")
>
> procInfo :: Pid -> IO PsInfo
> procInfo pid = do
> let dir = "/proc/" ++ pid ++ "/"
> procData <- readFile $ dir ++ "status"
> exe <- readLink $ dir ++ "exe"
> cwd <- readLink $ dir ++ "cwd"
> cmd <- readFile $ dir ++ "cmdline"
> let cmd' = subRegex (mkRegex "[^a-zA-z[:space:]\\/\\.-]") cmd " "
> info = processData procData
> adminInfo = Map.fromList [("Exe", exe), ("Cwd", cwd),
> ("Cmd", cmd')]
> return $ PsInfo (Map.union info adminInfo) []
>
> addProc :: Pid -> StateT PsTree IO PsInfo
> addProc pid = do
> info <- lift $ procInfo pid
> modify (Map.insert pid info)
> return info
>
> appendChild :: Pid -> Pid -> StateT PsTree IO Bool
> appendChild ppid pid = do
> tree <- get
> let PsInfo psData children = mapLookup ppid tree
> put $ Map.insert ppid (PsInfo psData (pid:children)) tree
> return True
As above, or
appendChild ppid pid =
Map.alter (fmap $ \ st -> st {children = pid : children st}) ppid
with the record syntax.
>
> insertParent :: Pid -> Pid -> StateT PsTree IO Bool
> insertParent ppid pid = do
> tree <- get
> if Map.member ppid tree
> then appendChild ppid pid
modify (appendChild ppid pid)
> else do
> built <- insertInTree ppid
> if built
> then appendChild ppid pid
> else return False
>
> insertPid :: Pid -> StateT PsTree IO Bool
> insertPid "1" = do
> info <- addProc "1"
> return True
> insertPid pid = do
> info <- addProc pid
> let ppid = parentPid info
> if ppid == "0" then return False else insertParent ppid pid
>
> insertInTree :: Pid -> StateT PsTree IO Bool
> insertInTree pid = do
> tree <- get
> if Map.member pid tree then return True else insertPid pid
>
> buildTree :: FilePath -> StateT PsTree IO Bool
> buildTree entry | entry =~ "^[0-9]+$" = insertInTree entry
> | otherwise = return False
This function is fairly complicated, simply because of the number of
separate definitions involved; I would be looking for opportunities
to inline definitions here, so it's clearer what the definitions
are. (Also, I would try to build a single, self-recursive function
at the top level, put the call to procInfo there, and make everything
else pure).
>
> createTree :: IO PsTree
> createTree = do
> entries <- getDirectoryContents "/proc"
> execStateT (mapM_ buildTree entries) Map.empty
>
> -- Process Tree pretty-printing
>
> treeStr :: PsTree -> PsChildren -> Int -> String -> String
> treeStr tree children level str = foldr append str children
> where append pid s = treeStr tree children' (level + 1) newstr
> where PsInfo _ children' = mapLookup pid tree
> pad = take (4 * level) [' ', ' ' ..]
> newstr = s ++ "\n" ++ pad ++ pid
>
> printTree :: PsTree -> Pid -> IO ()
> printTree tree pid = putStrLn (treeStr tree children 1 pid)
> where PsInfo _ children = mapLookup pid tree
>
> -- Process killing
>
> tryToKill :: PsTree -> KillFunction -> Pid -> Uid -> Uid -> Bool ->
> IO Bool
> tryToKill tree killFun pid "0" allowedUid killed = do
> -- The process may be starting, give it a second chance.
> sleep 10 -- conservative value.
> let PsInfo psData _ = mapLookup pid tree
> if allowedUid /= mapLookup "EUid" psData
> then killFun tree pid >> return True
> else return (killed || False)
> tryToKill tree killFun pid _ _ _ = killFun tree pid >> return True
>
> buildWhitelist :: String -> Whitelist -> Whitelist
> buildWhitelist line whitelist = do
> case line =~~ "^[ \t]*([^: \t]+)[ \t]*:[ \t]*([^ \t]+)[ \t]*$" of
> Nothing -> error "Invalid configuration file"
> Just ([_, exe, user]:_) -> Map.insert exe user whitelist
>
> readWhiteList :: FilePath -> IO Whitelist
> readWhiteList file = do
> contents <- readFile file
> return $ foldr buildWhitelist Map.empty (lines contents)
>
> allowedUidForExecutable :: Whitelist -> FilePath -> IO Uid
> allowedUidForExecutable whitelist exe = do
> case Map.lookup exe whitelist of
> Nothing -> return "0"
> Just user -> do
> entry <- getUserEntryForName user
> return $ show (userID entry)
>
> processBastard :: PsTree -> Whitelist -> KillFunction -> Bool ->
> Pid -> IO Bool
> processBastard tree whitelist killFun killed pid = do
> let PsInfo psData _ = mapLookup pid tree
> euid = mapLookup "EUid" psData
> exe = mapLookup "Exe" psData
> allowedUid <- allowedUidForExecutable whitelist exe
> if euid /= allowedUid
> then tryToKill tree killFun pid euid allowedUid killed
> else return killed
>
> withEachBastard :: PsTree -> Whitelist -> KillFunction -> IO Bool
> withEachBastard tree whitelist killFun =
> foldM (processBastard tree whitelist killFun) False children
> where (PsInfo _ children) = mapLookup "1" tree
>
> printWarnings :: Pid -> PsData -> IO ()
> printWarnings pid psData = do
> let exe = mapLookup "Exe" psData
> let cmd = mapLookup "Cmd" psData
> let cwd = mapLookup "Cwd" psData
> let ppid = mapLookup "PPid" psData
> let euid = mapLookup "EUid" psData
> let ruid = mapLookup "RUid" psData
> let egid = mapLookup "EGid" psData
> if ruid /= euid
> then hPrintf stderr "PID %s: RUID=%s, EUID=%s\n" pid ruid euid
> else return ()
> hPrintf stderr "Killing proc %s (%s, UID=%s, GID=%s), child of %s\n"
> pid exe euid egid ppid
> hPrintf stderr " Process command line: %s\n" cmd
> hPrintf stderr " Process working directory: %s\n" cwd
>
> killTree :: KillFunction
> killTree tree pid = do
> let PsInfo psData children = mapLookup pid tree
> printWarnings pid psData
> signalProcess sigKILL (read pid)
> mapM_ (killTree tree) children
>
> killBastards :: PsTree -> Whitelist -> Int -> IO ()
> killBastards tree whitelist n = do
> runAgain <- withEachBastard tree whitelist killTree
> if runAgain && n > 1
> then do
> sleep 2
> killBastards tree whitelist (n - 1)
> else
> return ()
>
> -- Configuration
>
> printConfig :: KillFunction
> printConfig tree pid = do
> let PsInfo psData _ = mapLookup pid tree
> exe = mapLookup "Exe" psData
> euid = mapLookup "EUid" psData
> entry <- getUserEntryForID (read euid)
> putStrLn $ exe ++ ": " ++ (userName entry)
I wouldn't call this a KillFunction; in fact, I would probably just
inline the definition of KillFunction throughout. An expression that
has to be decoded is better than a name that is misleading.
>
> -- main helpers
>
> config :: PsTree -> IO ()
> config tree = do
> withEachBastard tree Map.empty printConfig
> return ()
>
> pstree :: PsTree -> IO ()
> pstree tree = do
> printTree tree "1"
> -- In newer kernels, process 2 is kthreadd, which is not a child
> -- of init.
> let info = mapLookup "2" tree
> if parentPid info /= "1" then printTree tree "2" else return ()
>
> kill :: PsTree -> IO ()
> kill tree = do
> whitelist <- readWhiteList "killbastards.conf"
> killBastards tree whitelist 5
>
> main :: IO ()
> main = do
> args <- getArgs
> tree <- createTree
> case args of
> ["config"] -> config tree
> ["pstree"] -> pstree tree
> [] -> kill tree
>
More information about the Haskell-Cafe
mailing list