[Haskell-cafe] Re: String vs ByteString

Mon Aug 16 19:55:32 EDT 2010

On 16.08.10 14:44, Daniel Fischer wrote:
> Hi Bulat,
> On Monday 16 August 2010 07:35:44, Bulat Ziganshin wrote:
>> Hello Daniel,
>>
>> Sunday, August 15, 2010, 10:39:24 PM, you wrote:
>>> That's great. If that performance difference is a show stopper, one
>>> shouldn't go higher-level than C anyway :)
>>
>> *all* speed measurements that find Haskell is as fast as C, was
>> broken.
>
> That's a pretty bold claim, considering that you probably don't know all
> such measurements ;)
>
> [...]
> If you are claiming that his test was flawed (and since the numbers clearly
> showed Haskell slower than C, just not much, I suspect you do, otherwise I
> don't see the point of your post), could you please elaborate why you think
> it's flawed?
Hi Daniel,
you are right, the throughput of 'cat' (as proposed by Bulat) is not a 
fair comparison, and 'all speed measurements favoring haskell are 
broken' is hardly a reasonable argument. However, 'wc -m' is indeed a 
rather slow way to count the number of UTF-8 characters. Python, for 
example, is quite a bit faster (1.60s vs 0.93s for 70M) on my 
machine[1,2]. Despite of all this, I think the performance of the text 
package is very promising, and hope it will improve further!

cheers, benedikt

[1] A special purpose C implementation (as the one presented here: 
http://canonical.org/~kragen/strlen-utf8.html) is even faster (0.50), 
but that's not a fair comparison either.
[2] I do not know Python, so maybe there is an even faster way than
  print len(sys.stdin.readline().decode('utf-8'))