[Haskell-cafe] Grapheme length?

amindfv at mailbox.org amindfv at mailbox.org
Sat Feb 20 03:03:54 UTC 2021


On Fri, Feb 19, 2021 at 09:03:44PM -0500, Viktor Dukhovni wrote:
> On Fri, Feb 19, 2021 at 06:05:12PM -0700, amindfv--- via Haskell-Cafe wrote:
> 
> > Does there exist a Haskell library or function for getting grapheme
> > lengths of String/Text values?
> 
> Depends on your definition of "grapheme length" :-)
> If you're OK with counting NFC code points, then the answer is yes,
> via the "text-icu" package.
> 
>     $ cabal repl -z -v0 \
>       --repl-options "-package=text-icu" \
>       --repl-options "-package=text" \
>       --repl-options -XOverloadedStrings
>     λ> import qualified Data.Text as T
>     λ> import Data.Text.ICU.Normalize
>     λ> length $ T.unpack $ normalize NFC "ä"
>     1
>     λ> length $ T.unpack $ normalize NFD "ä"
>     2
>     λ> length $ T.unpack $ normalize NFC $ normalize NFD "ä"
>     1
> 

Thanks. Unfortunately this doesn't work well for graphemes which don't have a 1-code-point equivalent, like:

    length $ T.unpack $ normalize NFC $ normalize NFD "❤️"
    == 2

> With the "Data.Text.ICU.Char" module, it may be possible to determine
> grapheme boundaries:
> 
>     https://hackage.haskell.org/package/text-icu-0.7.0.1/docs/Data-Text-ICU-Char.html#g:5

I'll look into this and report back.

Tom


More information about the Haskell-Cafe mailing list