Summary and call for discussion on text proposal

Sun Nov 7 10:16:19 EST 2010

On Sun, Nov 07, 2010 at 02:36:35PM +0000, Duncan Coutts wrote:
> 
> It seems clear that we all want the package accepted, the disagreement
> is over details of the API. The problem here is not the amount of work
> to make the changes some people have been suggesting, the problem is
> disagreement over whether change is necessary and if so what change.

Right.

> There are two axes in which Text functions are generalised:
>   * character predicate  (e.g. searching for first char matching a predicate)
>   * substring            (e.g. searching for a substring)

Not necessarily character /predicate/, e.g.:
    count :: Char -> Text -> Int
vs
    count :: Text -> Text -> Int

> The fact that there are these two forms of most functions is different
> from the List library which only has the element predicate direction,
> not the sub-sequence direction.

This is true at the moment, but there is no reason one couldn't want or
have the sub-sequence versions for lists. I think I've occasionally
wanted this, but I can't recall concrete examples OTTOMH.

> The design of the Text library encourages the use of substring
> operations because these are expected to be more commonly used and
> because correct handling of Unicode often requires substring
> operations (due to issues with combining characters).

Can you give an example of such an operation please, which doesn't go
wrong when the argument is "c", the input contains "cx" and 'x' is a
combining character such that there is no composed codepoint for "cx"?

> People with concerns should restate those concerns

I think you've covered my concerns (consistency with list/bytestring).

> and if necessary
> questions should be asked to clarify the concerns.

Asked above  :-)

> Option 3
> --------
> 
> breakStr :: Text           -> Text -> (Text, Text)
> breakChr :: (Char -> Bool) -> Text -> (Text, Text)
> 
> This give neither version the short name 'break', but gives both
> reasonably short names with a suffix to indicate the character
> predicate vs substring.

I think this is better than option 1 (break doesn't do something
unexpected), but worse than option 2 (break doesn't do what is expected,
i.e. you need to actually go and look for the name of the function you
want) (from the context of someone familiar with the list and bytestring
APIs). I think it's closer to option 1 than option 2, though.

Thanks
Ian