[Haskell-beginners] Noobie attempt to process log output into dependency graph

Wed Dec 14 23:28:07 UTC 2016

(Or you could find it here: https://github.com/JohnL4/DependencyGraph)

On Wed, Dec 14, 2016 at 6:26 PM, John Lusk <johnlusk4 at gmail.com> wrote:

> Hi, all,
>
> Here's my question:
>
> I thought, for grins, I'd try to turn some log output into a dependency
> graph (using GraphViz's dot(1)). I'm having difficulty forcing my
> stateful paradigm into a functional one, so I need some help.
>
> If I was to do this with an imperative (stateful) language, I'd build a
> set of edges (or a map to a frequency count, really, since I'll use freq
> > 1 to add some output text noting the repeated occurrences), and then
> dump out the set elements to a text file that would look something like
> this fragment:
>
> a -> q
> q -> d
> d -> e [color=red]
> d -> f [color=red
>
> My big problem now is that if I process a subtree that looks like:
>
> a
>   b
>     c
>     d
>   b
>     d
>     e
>
> my current plan is to proces the first b-c-d subtree and then process the
> b-d-e subtree, *BUT* I need to pass the updated edge set to the second
> processing call, which is pretty stateful.
>
> Do I need to just bite the bullet and find some succinct way to do that,
> or is my entire approach just wrong, stuck in my stateful mindset?
>
> My (awful) code looks like this:
>
> -- Emit to stdout a series of dot(1) edges specifying dependencies.-- "A -> B" means "A depends on B".---- Build with 'ghc dependency-graph.hs'-- -- Input is a text file containing lines as follows:--      (some indentation) (some extraneous text) (file-A) in (some directory)--          (some extra indentation) (some extraneous text) (file-B) in (some directory)--      (some indentation matching the first line above) (some extraneous text) (file-C) in (some directory)---- This means that file-A depends on file-B, but neither file-A nor file-B depend on file-C.---- Sample:--    Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : Processing SXA.Compass.Config.ViewModel.dll  in C:\Program Files (x86)\Allscripts Sunrise\Clinical Manager Client\7.2.5575.0\--    Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : Adding C:\Program Files (x86)\Allscripts Sunrise\Clinical Manager Client\7.2.5575.0\SXA.Compass.Config.ViewModel.dll (IsPresent=true)        to assemblyList at beginning of GetAssemblyListEx()--      Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : Processing SXA.Compass.Config.Utils.dll    in C:\Program Files (x86)\Allscripts Sunrise\Clinical Manager Client\7.2.5575.0\---- (Need to skip the line containing "Adding", and only process the ones containing "Processing".)-- -- Algorithm:--      Read first line, parse, remember indentation--      Repeat for other lines, but if indentation increases, store pair A -> B in hashset.--      At end, dump out hashset.
> -- import Debug.Trace-- import System.Environment-- import System.Console.GetOpt-- import Data.Maybe (fromMaybe)-- import Data.List.Splitimport Prelude -- hiding (readFile) -- Because we want the System.IO.Strict version-- import System.IO (hPutStr, hPutStrLn, stderr)-- import System.IO.Strict-- import Control.Monad-- import System.Directory-- import System.FilePathimport Text.Regex.TDFA-- import Text.Regex.TDFA.String-- import Text.Printf
> -- import qualified Data.Map.Lazy as Mapimport qualified Data.Map.Strict as Map
> ---------------------------------------------------------------- Test Datal1 = "    Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : Processing SXA.Compass.Config.ViewModel.dll\tin C:\\Program Files (x86)\\Allscripts Sunrise\\Clinical Manager Client\\7.2.5575.0\\"l2 = "    Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : Adding C:\\Program Files (x86)\\Allscripts Sunrise\\Clinical Manager Client\\7.2.5575.0\\SXA.Compass.Config.ViewModel.dll\t(IsPresent=true)\tto assemblyList at beginning of GetAssemblyListEx()"l3 = "      Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList() Information: 0 : Processing SXA.Compass.Config.Utils.dll\tin C:\\Program Files (x86)\\Allscripts Sunrise\\Clinical Manager Client\\7.2.5575.0\\"---------------------------------------------------------------- Test Data Ends-- See http://stackoverflow.com/q/32149354/370611-- toRegex = makeRegexOpts defaultCompOpt{multiline=False} defaultExecOpt
> -- Escape parens?-- initialFillerRegex :: String-- initialFillerRegex = "Helios.MigrationTool.Common.AssemblyUtils.GetAssemblyList\\(\\) Information: 0 : Processing"
> -- Regex matching (marking) a line to be processed-- valuableLineRegex :: String-- valuableLineRegex = "\\bProcessing\\b"
> -- |Regex matching line to be parsedparseLineRegex :: StringparseLineRegex = "^(.* Information: 0 : Processing )([^ ]*)[ \t]+in (.*)" -- 3subexpressions
> main :: IO()main = do
>   logContents <- getContents
>   putStrLn $ unlines $ fst $ edges (parseIndent $ lines logContents) Map.empty
> ------------------------------------------------------------------ |Parses out the leading indentation of the given String into a string of spaces and the rest of the lineparseIndent :: String -> (String,String)parseIndent s = ((fourth $ (s =~ "^( *)(.*)" :: (String,String,String,[String]))) !! 0,
>                  (fourth $ (s =~ "^( *)(.*)" :: (String,String,String,[String]))) !! 1)
> ------------------------------------------------------------------ |Returns a list of strings describing edges in the form "a -> b /* comment */"edges ::
>   [(String,String)]             -- ^ Input tuples: (indent, restOfString)
>   -> Map.Map String Int -- ^ Map of edges in form "a -> b" with a count of the number of times that edge occurs
>   -> [String]           -- ^ Output list of edge descriptions in form "a -> b optionalExtraText"
> edges [] edgeSet =
>   (edgeDump $ Map.assocs edgeSet, 0)
> edges (lastLine:[]) edgeSet =
>   (edgeDump $ Map.assocs edgeSet, 1)
> edges (fstLogLine:sndLogLine:[]) edgeSet =
>   let fstFields = (snd fstLogLine) =~ parseLineRegex :: (String,String,String,[String])
>       sndFields = (snd sndLogLine) =~ parseLineRegex :: (String,String,String,[String])
>   in
>     if length (fourth fstFields) == 0
>     then error ("Unmatched: " ++ (first fstFields)) -- First line must always match
>     else if length (fourth sndFields) == 0 -- "Adding", not "Processing"
>     then edges (fstLogLine:[]) edgeSet -- Skip useless line
>     else if indentLength fstLogLine >= indentLength sndLogLine
>     then edges (sndLogLine:[]) edgeSet -- Can't be an edge from first to second line; drop first line and keep going.
>     else edges (sndLogLine:[])
>          (Map.insertWith (+) ((fullName fstFields) ++ (fullName sndFields)) 1)
> edges (fstLogLine:sndLogLine:thdLogLine:logLines) edgeSet =
>   let fstFields = (snd fstLogLine) =~ parseLineRegex :: (String,String,String,[String])
>       sndFields = (snd sndLogLine) =~ parseLineRegex :: (String,String,String,[String])
>       thdFields = (snd thdLogLine) =~ parseLineRegex :: (String,String,String,[String])
>   in
>     if length (fourth fstFields) == 0
>     then error ("Unmatched: " ++ (first fstFields)) -- First line must always match
>
>     else if length (fourth sndFields) == 0 -- "Adding", not "Processing"
>     then edges (fstLogLine:thdLogLine:logLines) edgeSet -- Skip useless line
>
>     else if indentLength fstLogLine >= indentLength sndLogLine
>     then []                     -- Stop processing at outdent
>
>     else
>       -- Looking one of:
>       --       1
>       --          2 -- process 1 -> 2, then process 2.. as subtree
>       --             3 -- Need to process as subtree rooted at 2, then drop subtree (zero or more lines at same level as 3)
>       -- or
>       --       1
>       --          2 -- processs, then drop this line (process 2.. as empty subtree?)
>       --          3
>       -- or
>       --       1
>       --          2 -- process, then drop this line (drop entire subtree rooted at 1) (same as above, drop empty subtree? (2))
>       --       3
>       -- or
>       --       1
>       --          2 -- same as above? Drop empty subtree rooted at 2
>       --    3
>       edges (sndLogLine:thdLogLine:logLines) (Map.insertWith (+) ((fullName fstFields) ++ (fullName sndFields)) 1) -- now what? I need to pass the UPDATED edgeSet on to the next call, after the subtree rooted at 2 is dropped.
>
>
>
>     then edges (sndLogLine:logLines) edgeSet -- Can't be an edge from first to second line; drop first line and keep going.
>     else edges (sndLogLine:(takeWhile (increasingIndent $ length $ fst fstLogLine) logLines))
>          (Map.insertWith (+) ((fullName fstFields) ++ (fullName sndFields)) 1)
>     else ((fst $ edges (sndLogLine:logLines) edgeSet)
>            ++ (fst $ edges (fstLogLine:(drop
>                                         (snd $ edges (sndLogLine:logLines) edgeSet) -- # of lines processed
>                                         logLines)) edgeSet),
>           (snd $ edges (sndLogLine:logLines) edgeSet)
>           + (snd $ edges (fstLogLine:(drop
>                                       (snd $ edges (sndLogLine:logLines) edgeSet) -- # of lines processed
>                                       logLines)) edgeSet)
>          )
> ----------------------------------------------------------------fullname :: (String,String,String,[String]) -> Stringfullname (_,_,_,[_,fileName,directoryName]) = directoryName ++ fileName
> ------------------------------------------------------------------ |Edges from the first line to all following linesedgesFrom :: String             -- ^ First line
>   -> [String]                   -- ^ Following lines
>   -> Map.Map String Int         -- ^ Set of edges built so far
>   -> [String]edgesFrom a b c = []
> ------------------------------------------------------------------ |Return length of indent or errorindentLength :: (String,String,String,[String]) -- ^ Regex match context
>   -> Int                                        -- ^ Length of indentindentLength (prefix,_,_,[]) = error $ "Not matched: " ++ prefixindentLength (_,_,_,subexprs) =
>   length $ subexprs !! 0
> ------------------------------------------------------------------ |Returns a list of edges, possibly with comments indicating occurrence counts > 1edgeDump :: [(String,Int)]     -- ^ List of (edge,count) tuples
>   -> [String]                  -- ^ List of edges, possibly w/commentsedgeDump [] = []edgeDump ((edge,count):rest)
>   | count <= 1  = edge:(edgeDump rest)
>   | otherwise   = (edge ++ " /* " ++ (show count) ++ " occurrences */"):(edgeDump rest)
> ----------------------------------------------------------------first :: (a,b,c,d) -> afirst (x,_,_,_) = x
> fourth :: (a,b,c,d) -> dfourth (_,_,_,x) = x
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/beginners/attachments/20161214/d71c1195/attachment-0001.html>