ANN: unicode-properties 3.2.0.0, unicode-names 3.2.0.0
Ashley Yakeley
ashley at semantic.org
Tue Sep 2 00:54:38 EDT 2008
unicode-properties 3.2.0.0, unicode-names 3.2.0.0
These two packages are representations in Haskell of various data in the
Unicode 3.2.0 Character Database. Unicode 3.2.0 was the latest version
of the Unicode standard at the time I wrote most of the code; later I
may move the packages to the latest version (currently 5.1.0).
The unicode-properties package contains functions to determine general
category, case, and a wide range of other properties, as well as to do
decomposition and case-folding.
The unicode-names package contains just one function, getCharacterName,
for getting the name of a character. It's separated out because it's a
sufficiently large proportion of the total data.
Both packages use the type "Char" to represent Unicode characters (more
pedantically, codepoints). In GHC Char has the range
['\x0'..'\x10FFFF'], matching the Unicode standard. The packages won't
work with compilers that restrict Char to a smaller range.
Hackage:
<http://hackage.haskell.org/cgi-bin/hackage-scripts/package/unicode-properties>
<http://hackage.haskell.org/cgi-bin/hackage-scripts/package/unicode-names>
Source for both packages: <http://code.haskell.org/unicode-properties/>
Most of the data is auto-generated at build time from files downloadable
from the Unicode web-site.
I expect Don will have them both in Arch Linux within the hour.
--
Ashley Yakeley
More information about the Libraries
mailing list