[Haskell-cafe] Attoparsec.ByteString.Char8 or Attoparsec.ByteString for diff output?

Pedro B. pedroborg at gmail.com
Mon Feb 20 19:58:10 UTC 2023


El 20/2/2023 a las 1:43 p. m., Viktor Dukhovni escribió:
> On Mon, Feb 20, 2023 at 10:46:38AM -0400, Pedro B. wrote:
> 
>> Thanks Li-yao . As I mentioned in my answer to Viktor, I am now using
>> the ByteString functions except when I want to parse Char8's, for
>> example to parse an 'a' with Data.Attoparsec.ByteString.Char8.char 'a'.
> 
> FWIW, you can often avoid the Char8 combinators, e.g. for matching a
> specific 8-bit (ASCII) character, at a modest loss of readability,
> you can just match its Word8 code point:
> 
>      0x0a <--- '\n'
>      0x0d <--- '\r'
>      0x20 <--- ' '
>      0x30 <--- '0'
>      0x41 <--- 'A'
>      0x61 <--- 'a'
>      ...
> 
> I am comfortable with the raw hex values of various "interesting"
> characters, but you can also define aliases:
> 
>      import Data.Char (ord)
> 
>      char_nl, char_cr, char_sp, char_0, char_A, char_a :: Word8
>      char_nl = fromIntegral $ ord '\n'
>      char_cr = fromIntegral $ ord '\r'
>      char_sp = fromIntegral $ ord ' '
>      ...
> 


I am using the Data.Word8 module provided by the word8 package, which 
defines _lf, _tab, _cr, and so on, and even _a.._z, _0.._9, etc. For 
example, I may use (==_tab)  as the argument for 
Data.Attoparsec.ByteString.takeTill.

You made me realize that I can use "word8 _a" instead of  "char 'a'" and 
almost have no need for the Char8 combinators. I'll probably do that and 
only use  "decimal" from Char8 to parse integers, which I need to parse 
line ranges such as "2,10".

I still have a doubt though: given that I only match specific characters 
generated by diff, do I gain something by not using Char8? Performance, 
perhaps?

Regards,

Pedro



More information about the Haskell-Cafe mailing list