[Haskell-cafe] Attoparsec.ByteString.Char8 or Attoparsec.ByteString for diff output?

Pedro B. pedroborg at gmail.com
Mon Feb 20 14:35:32 UTC 2023


El 17/2/2023 a las 2:08 p. m., Viktor Dukhovni escribió:
> On Fri, Feb 17, 2023 at 01:32:48PM -0400, Pedro B. wrote:
> 
>> I am developing a program to parse dif output taken from stdin (as in
>> diff file1 file2 | myApp) or from a file. I am reading  the input as
>> ByteString in either case and I am parsing it Attoparsec. My question
>> is, Should I use Data.Attoparsec.ByteString.Char8  or
>> Data.Attoparsec.ByteString?
>>
>> So far, I've been  using Data.Attoparsec.ByteString.Char8  and it works
>> for my sample files, which are in utf8 or, latin1, or the default
>> Windows encoding.
>>
>> What do you suggest?
> 
> Because the underlying ByteString data type is the same:
> 
>      Data.ByteString ~ Data.ByteString.Char8
> 
> you can use either or both sets of combinators as you see fit.  The
> Char8 combinators match the parsed ByteStrings against Char predicates,
> while the base ByteString combinators match against Word8 predicates.
> The below is valid:
> 
>      import Data.Attoparsec.ByteString       as A8
>      import Data.Attoparsec.ByteString.Char8 as AC
> 
>      ...
> 
>      myParser :: ...
>      myparser ... = do
>          ...
>          -- parse a Word8 byte followed by an 8-bit Char
>          w <- A8.anyWord8
>          c <- AC.anyChar
>          ...
> 

Thanks for your answer, Viktor.

I am now using base ByteString by default, and Char8 combinators only 
when needed, as when I have to use AC.char or AC.string.

I was confused when I wanted to parse lines coming from the diffed files 
using "AC.takeTill AC.isEndOfLine". This does not type-check because 
AC.takeTill expects a predicate on Char8, but AC.isEndOfLine is a 
predicate on Word8, even when it is defined in the Char8 module, why? 
Now I am using A8.takeTill AC.isEndOfLine.

I was also worried about the warning in the Char8 about truncated bytes. 
The output actually generated by diff should not have any problem, but 
the lines coming from the diffed files could be in any encoding. I 
assumed that AC.takeTill should not cause problems since it does not 
examine the ByteString except that for the argument predicate. Anyway 
now I am using A8.takeTill, as I mentioned.

Regards,

Pedro Borges



More information about the Haskell-Cafe mailing list