[Haskell-cafe] Grapheme length?

Viktor Dukhovni ietf-dane at dukhovni.org
Sat Feb 20 02:03:44 UTC 2021


On Fri, Feb 19, 2021 at 06:05:12PM -0700, amindfv--- via Haskell-Cafe wrote:

> Does there exist a Haskell library or function for getting grapheme
> lengths of String/Text values?

Depends on your definition of "grapheme length" :-)
If you're OK with counting NFC code points, then the answer is yes,
via the "text-icu" package.

    $ cabal repl -z -v0 \
      --repl-options "-package=text-icu" \
      --repl-options "-package=text" \
      --repl-options -XOverloadedStrings
    λ> import qualified Data.Text as T
    λ> import Data.Text.ICU.Normalize
    λ> length $ T.unpack $ normalize NFC "ä"
    1
    λ> length $ T.unpack $ normalize NFD "ä"
    2
    λ> length $ T.unpack $ normalize NFC $ normalize NFD "ä"
    1

With the "Data.Text.ICU.Char" module, it may be possible to determine
grapheme boundaries:

    https://hackage.haskell.org/package/text-icu-0.7.0.1/docs/Data-Text-ICU-Char.html#g:5

-- 
    Viktor.


More information about the Haskell-Cafe mailing list