[Haskell-beginners] Defining custom parser using Parsec
Magnus Therning
magnus at therning.org
Mon Oct 18 02:56:33 EDT 2010
On Sun, Oct 17, 2010 at 22:59, Jimmy Wylie <jwylie at uno.edu> wrote:
> Hi everyone,
>
> I'm working on a digital forensics application that will take a file with
> lines of the following format:
>
> "MD5|name|inode|mode_as_string|UID|GID|size|atime|mtime|ctime|crtime"
>
> This string represents the metadata associated with a particular file in the
> filesystem.
>
> I created a data type to represent the information that I will need to
> perform my analysis:
>
> data Event = Event {
> fn :: B.ByteString,
> mftNum :: B.ByteString,
> ft :: B.ByteString,
> fs :: Integer,
> time :: Integer,
> at :: AccessType
> mt :: AccessType
> ct :: AccessType
> crt :: AccessType
> } deriving (Show)
>
> data AccessType = ATime | MTime | CTime | CrTime
> deriving (Show)
>
> I would like to create a function that takes the Bytestring representing the
> file and returns a list of Events:
> createEvents :: ByteString -> [Event]
> (For now I'm creating a list, but depending on the type of analysis I decide
> to do, I may change this data structure)
>
> I understand that I can use the Parsec Library to do this. I read RWH, and
> noticed they have the endBy and sepBy combinators, but my issue with these
> is that using these funcitons performs too many transformations on the data.
> endBy will return a list of strings, which then will be used by sepBy which
> will then return a [[ByteString]] which I will then have to iterate through
> to create the final [Event].
>
> What I would like to do is define a custom parser, that will go from the
> ByteString to the [Event] without the overhead of those intermediate steps.
> This function needs to be as fast as possible, as these files can be rather
> large, and I will be performing many different tests and analysis on the
> data. I don't want the parsing to be a bottleneck.
This sounds awfully lot like a premature optimisation, which as we all
know, is the root of evil :-)
Why do you think that using Parsec will result in fewer
transformations? (It will most likely result in fewer transformations
*that you see*, but that doesn't mean much.)
> I'm under the impression that the Parsec library will allow me to define a
> custom parser to do this, but I'm having problems understanding the library,
> and the documentation for it.
>
> A gentle shove in the right direction would be greatly appreciated.
AFAIK Parsec deals with String, not ByteString, have a look at the
attoparsec library[1] instead.
There are numerous explanations of using parser combinators out there.
Personally I've found the Parsec documentation fairly easy to
understand. A while ago I wrote a few posts myself on it, and I think
they should translate well to attoparsec (you will probably have to
keep the haddock doc at hand though):
http://therning.org/magnus/archives/289
http://therning.org/magnus/archives/290
http://therning.org/magnus/archives/295
http://therning.org/magnus/archives/296
/M
[1]: http://hackage.haskell.org/package/attoparsec-0.8.1.1
--
Magnus Therning (OpenPGP: 0xAB4DFBA4)
magnus@therning.org Jabber: magnus@therning.org
http://therning.org/magnus identi.ca|twitter: magthe
More information about the Beginners
mailing list