[Haskell-cafe] Attoparsec.ByteString.Char8 or Attoparsec.ByteString for diff output?
Pedro B.
pedroborg at gmail.com
Mon Feb 20 14:35:32 UTC 2023
El 17/2/2023 a las 2:08 p. m., Viktor Dukhovni escribió:
> On Fri, Feb 17, 2023 at 01:32:48PM -0400, Pedro B. wrote:
>
>> I am developing a program to parse dif output taken from stdin (as in
>> diff file1 file2 | myApp) or from a file. I am reading the input as
>> ByteString in either case and I am parsing it Attoparsec. My question
>> is, Should I use Data.Attoparsec.ByteString.Char8 or
>> Data.Attoparsec.ByteString?
>>
>> So far, I've been using Data.Attoparsec.ByteString.Char8 and it works
>> for my sample files, which are in utf8 or, latin1, or the default
>> Windows encoding.
>>
>> What do you suggest?
>
> Because the underlying ByteString data type is the same:
>
> Data.ByteString ~ Data.ByteString.Char8
>
> you can use either or both sets of combinators as you see fit. The
> Char8 combinators match the parsed ByteStrings against Char predicates,
> while the base ByteString combinators match against Word8 predicates.
> The below is valid:
>
> import Data.Attoparsec.ByteString as A8
> import Data.Attoparsec.ByteString.Char8 as AC
>
> ...
>
> myParser :: ...
> myparser ... = do
> ...
> -- parse a Word8 byte followed by an 8-bit Char
> w <- A8.anyWord8
> c <- AC.anyChar
> ...
>
Thanks for your answer, Viktor.
I am now using base ByteString by default, and Char8 combinators only
when needed, as when I have to use AC.char or AC.string.
I was confused when I wanted to parse lines coming from the diffed files
using "AC.takeTill AC.isEndOfLine". This does not type-check because
AC.takeTill expects a predicate on Char8, but AC.isEndOfLine is a
predicate on Word8, even when it is defined in the Char8 module, why?
Now I am using A8.takeTill AC.isEndOfLine.
I was also worried about the warning in the Char8 about truncated bytes.
The output actually generated by diff should not have any problem, but
the lines coming from the diffed files could be in any encoding. I
assumed that AC.takeTill should not cause problems since it does not
examine the ByteString except that for the argument predicate. Anyway
now I am using A8.takeTill, as I mentioned.
Regards,
Pedro Borges
More information about the Haskell-Cafe
mailing list