<p dir="ltr">Purely as an experiment, I've written a function that uses ByteString to simply elemIndex it's way across a string here. Look for <, then look for >. Repeat until done.</p>
<p dir="ltr"><a href="https://github.com/chrisdone/xeno">https://github.com/chrisdone/xeno</a> (under src/Xeno.hs)</p>
<p dir="ltr">But if you scroll down the README to the 182kb file example, you see that hexml takes 33us and xeno takes 111us. That's surprising to me because I'm doing just a walk across a string and hexml is doing a full parse. It's written in C, but still, 3x faster AND doing allocations and more work. </p>
<p dir="ltr">I tried replacing the ByteString with a raw Ptr Word8 and it didn't make a difference, actually increased time a little bit.</p>
<p dir="ltr">My weigh results indicate that it's not doing any allocations during the process, at least nothing linear or above. So, I haven't looked at the core or asm yet, but I'm guessing it's simply doing more instructions and/or indirections than necessary.</p>
<p dir="ltr">You can reproduce this with stack build --bench xeno</p>
<p dir="ltr">Can anyone make an improvement to the speed? I already nerd sniped myself enough with this, so I'm spreading the bug elsewhere. I think it's a pretty good "raw" performance exercise and possibly something that could serve as a tutorial on haskell performance.</p>
<p dir="ltr">Ciao!</p>