[Haskell-cafe] Attoparsec.ByteString.Char8 or Attoparsec.ByteString for diff output?

Viktor Dukhovni ietf-dane at dukhovni.org
Fri Feb 17 18:08:31 UTC 2023


On Fri, Feb 17, 2023 at 01:32:48PM -0400, Pedro B. wrote:

> I am developing a program to parse dif output taken from stdin (as in 
> diff file1 file2 | myApp) or from a file. I am reading  the input as 
> ByteString in either case and I am parsing it Attoparsec. My question 
> is, Should I use Data.Attoparsec.ByteString.Char8  or 
> Data.Attoparsec.ByteString?
> 
> So far, I've been  using Data.Attoparsec.ByteString.Char8  and it works 
> for my sample files, which are in utf8 or, latin1, or the default 
> Windows encoding.
> 
> What do you suggest?

Because the underlying ByteString data type is the same:

    Data.ByteString ~ Data.ByteString.Char8

you can use either or both sets of combinators as you see fit.  The
Char8 combinators match the parsed ByteStrings against Char predicates,
while the base ByteString combinators match against Word8 predicates.
The below is valid:

    import Data.Attoparsec.ByteString       as A8
    import Data.Attoparsec.ByteString.Char8 as AC

    ...

    myParser :: ...
    myparser ... = do
        ...
        -- parse a Word8 byte followed by an 8-bit Char
        w <- A8.anyWord8
        c <- AC.anyChar
        ...

-- 
    Viktor.


More information about the Haskell-Cafe mailing list