Adding split/split' to Data.List, and redefining words/lines with it; also, adding replace/replaceBy

Gwern Branwen gwern0 at gmail.com
Thu Jul 10 23:15:30 EDT 2008


On 2008.07.11 00:11:15 +0100, Neil Mitchell <ndmitchell at gmail.com> scribbled 1.1K characters:
> Hi
>
> >  What do people think of adding these?
>
> split is sorely lacking, and definately needs to be included. However,
> my version is different to yours:
>
> split :: Eq a => a -> [a] -> [[a]]
> split x [] = []
> split x xs = if null b then [a] else a : split x (tail b)
>     where (a,b) = break (== x) xs
>
> split '*' "hello*neil" = ["hello","neil"]
>
> While with yours:
>
> split '*' "hello*neil" = ["hello","*","neil"]
>
> I much prefer mine.

Well, your version of split is entirely respectable.

And according to QuickCheck, identical to my split' (I knew sending this that there would be dog-shed issues, and I was hoping to avoid them):

 splitNeil :: Eq a => a -> [a] -> [[a]]
 splitNeil x [] = []
 splitNeil x xs = if null b then [a] else a : splitNeil x (tail b)
     where (a,b) = break (== x) xs

 splitNeilProp x y = splitNeil x y == split' x y

*Foo> quickCheck splitNeilProp
+++ OK, passed 100 tests.

(Still, I think I'll hold onto this definition. It isn't obviously working the same way to my eyes, and could provide a useful sanity check for split'.)

> Didn't the bytestring people add it, under some gise, to their
> library? It should be consistent with that.

The bytestring split is apparently different from anything discussed here. That is, <http://hackage.haskell.org/packages/archive/bytestring/0.9.1.0/doc/html/Data-ByteString.html#v%3Asplit> says that:

 split 'a'  "aXaXaXa"    == ["","X","X","X",""]

while my split:

 split (=='a')  "aXaXaXa" == ["","a","X","a","X","a","X","a"]

and split'/neilSplit:

 split' 'a'  "aXaXaXa" == ["","X","X","X"]

I'm not sure it's all that important though, as ByteString.split and my split both are invertible (intercalate [c] . ByteStringsplit c == id; (concat $ split (==x) y) == y) and split' isn't.

---

More generally, I feel the library should have both. Even though split' is simple in terms of split, this discussion shows that people want to consume delimiters at times, and having both makes it possible to define lines and words both using them.

> > and perhaps a better name for split'
>
> A better name is essential. split' should be for the strict version of
> split, not something quite different.

Yes, that's true. I wasn't thinking of strictness but of the prime notation - ie. here's another, specialized version of split.

So what would you suggest? splitConsume? splitLossy? splitAndShrink?

> >  On a secondary note, but less important than the foregoing, I'd like to add two functions: 'replace' and 'replaceBy'. They do basically what they sound like: given two items, change every occurrence in a given list of one item to another.
>
> I commonly define:
>
> rep :: Eq a => a -> a -> a
> rep from to x = if x == from then to else x
>
> Now you can do replace with map rep.
>
> Still, replace and replaceBy might be useful to have.
>
> Thanks
>
> Neil

--
gwern
cryptogon Playboy Duress UXO Veiligheidsdienst B43 screws Poseidon AST BCCI
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org/pipermail/libraries/attachments/20080710/7fa15e64/attachment.bin


More information about the Libraries mailing list