[Haskell-cafe] Attoparsec.ByteString.Char8 or Attoparsec.ByteString for diff output?
Pedro B.
pedroborg at gmail.com
Mon Feb 20 19:58:10 UTC 2023
El 20/2/2023 a las 1:43 p. m., Viktor Dukhovni escribió:
> On Mon, Feb 20, 2023 at 10:46:38AM -0400, Pedro B. wrote:
>
>> Thanks Li-yao . As I mentioned in my answer to Viktor, I am now using
>> the ByteString functions except when I want to parse Char8's, for
>> example to parse an 'a' with Data.Attoparsec.ByteString.Char8.char 'a'.
>
> FWIW, you can often avoid the Char8 combinators, e.g. for matching a
> specific 8-bit (ASCII) character, at a modest loss of readability,
> you can just match its Word8 code point:
>
> 0x0a <--- '\n'
> 0x0d <--- '\r'
> 0x20 <--- ' '
> 0x30 <--- '0'
> 0x41 <--- 'A'
> 0x61 <--- 'a'
> ...
>
> I am comfortable with the raw hex values of various "interesting"
> characters, but you can also define aliases:
>
> import Data.Char (ord)
>
> char_nl, char_cr, char_sp, char_0, char_A, char_a :: Word8
> char_nl = fromIntegral $ ord '\n'
> char_cr = fromIntegral $ ord '\r'
> char_sp = fromIntegral $ ord ' '
> ...
>
I am using the Data.Word8 module provided by the word8 package, which
defines _lf, _tab, _cr, and so on, and even _a.._z, _0.._9, etc. For
example, I may use (==_tab) as the argument for
Data.Attoparsec.ByteString.takeTill.
You made me realize that I can use "word8 _a" instead of "char 'a'" and
almost have no need for the Char8 combinators. I'll probably do that and
only use "decimal" from Char8 to parse integers, which I need to parse
line ranges such as "2,10".
I still have a doubt though: given that I only match specific characters
generated by diff, do I gain something by not using Char8? Performance,
perhaps?
Regards,
Pedro
More information about the Haskell-Cafe
mailing list