[Haskell-beginners] Defining custom parser using Parsec
Jimmy Wylie
jwylie at uno.edu
Sun Oct 17 17:59:22 EDT 2010
Hi everyone,
I'm working on a digital forensics application that will take a file
with lines of the following format:
"MD5|name|inode|mode_as_string|UID|GID|size|atime|mtime|ctime|crtime"
This string represents the metadata associated with a particular file in
the filesystem.
I created a data type to represent the information that I will need to
perform my analysis:
data Event = Event {
fn :: B.ByteString,
mftNum :: B.ByteString,
ft :: B.ByteString,
fs :: Integer,
time :: Integer,
at :: AccessType
mt :: AccessType
ct :: AccessType
crt :: AccessType
} deriving (Show)
data AccessType = ATime | MTime | CTime | CrTime
deriving (Show)
I would like to create a function that takes the Bytestring representing
the file and returns a list of Events:
createEvents :: ByteString -> [Event]
(For now I'm creating a list, but depending on the type of analysis I
decide to do, I may change this data structure)
I understand that I can use the Parsec Library to do this. I read RWH,
and noticed they have the endBy and sepBy combinators, but my issue with
these is that using these funcitons performs too many transformations on
the data.
endBy will return a list of strings, which then will be used by sepBy
which will then return a [[ByteString]] which I will then have to
iterate through to create the final [Event].
What I would like to do is define a custom parser, that will go from the
ByteString to the [Event] without the overhead of those intermediate
steps. This function needs to be as fast as possible, as these files can
be rather large, and I will be performing many different tests and
analysis on the data. I don't want the parsing to be a bottleneck.
I'm under the impression that the Parsec library will allow me to define
a custom parser to do this, but I'm having problems understanding the
library, and the documentation for it.
A gentle shove in the right direction would be greatly appreciated.
Thanks for your help,
Jimmy
More information about the Beginners
mailing list