[Haskell] ANNOUNCE: FPS - FastPackedStrings 0.2

Einar Karttunen ekarttun at cs.helsinki.fi
Thu Apr 20 09:23:49 EDT 2006


On 20.04 10:52, Bulat Ziganshin wrote:
> this lib should be slower than your on small strings due to ForeignPtr
> inefficiency

Actually having O(1) substrings is very nice and can improve performance
quite a lot. Another feature is the easy integration with low level
libraries that want a Ptr for input/output.

> and i think that Donald should mention in doc/announcement that his
> lib is latin-1 only. it's not good that each of us should scan his
> sources to rediscover this fact

Actually I am using fps with UTF8 and no problems. The trick is that
I care about substrings rather than invidual characters. Usually
one ends up doing all the splitting etc on ascii characters and
the rest are handled as substrings where character boundaries
are meaningless.

We can use the UTF8 strings on multiple levels:

1) just bytes + ascii character matching
2) match physical unicode characters one by one
3) match unicode substrings

I would argue that in many cases either 1) or 3) is what is really wanted.
Composite characters and combining marks make 2) problematic. FPS does
1) quite well and it should be feasible to build separate modules
providing 2) or 3) on top of it.

Haskell does not support full range of unicode characters for meaningful
operations. One cannot do IO with the standard libraries with Chars
outside the Latin-1 range.

- Einar Karttunen


More information about the Libraries mailing list