Adding split/split' to Data.List, and redefining words/lines with
it; also, adding replace/replaceBy
Gwern Branwen
gwern0 at gmail.com
Thu Jul 10 10:55:23 EDT 2008
Hi everyone. So recently while doing some shell scripting, I found myself redefining a 'split' function (take an item and a list, and make list of lists everywhere the item appears) *yet again*, and I got annoyed enough to resolve to fix the situation. While I was at it, I decided that since 'lines' and 'words' are conceptually specializations of a general split function, I would come up with rewrites for them too; much more aesthetically satisfying to me, as it's clearer in the code now that lines and words are essentially a specialization of split, but with pragmatic edge cases (which mean we don't get nice identities like 'unlines . lines == id', but makes them more useful with, say, getContents). Code:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > lines' :: String -> [String] > lines' s = removeTrailingNull (split' '\n' s) > where > removeTrailingNull :: [String] -> [String] > removeTrailingNull y = case y of > [] -> [] > [""] -> [] > (x:xs) -> x : removeTrailingNull xs >
> linesProp :: String -> Bool > linesProp x = (Prelude.lines x == lines' x) >
> words' :: String -> [String] > words' = filter (not . and . map isSpace) . split isSpace >
> wordsProp :: String -> Bool > wordsProp x = (Prelude.words x == words' x) >
> split :: (a -> Bool) -> [a] -> [[a]] > split _ [] = [] > split p s = let (l,s') = break p s in l : case s' of > [] -> [] > (r:s'') -> [r] : split p s'' >
> splitUndoProp, splitUndoIdemProp, splitPreserveDelimsProp :: (Eq a) => a -> [a] > > -> Bool > splitUndoProp x y = (concat $ split (==x) y) == y > splitUndoIdemProp x y = (concat $ concat $ split (==[x]) $ split (==x) y) == y > splitPreserveDelimsProp x y = (length $ elemIndices [x] $ split (==x) y) == > > (length $ elemIndices x y) >
> split' :: (Eq a) => a -> [a] -> [[a]] > split' a b = filter (/= [a]) $ split (\x -> x==a) b > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
I've run many many QuickChecks testing against the Prelude lines and words, and the definitions seem to be correct.
What do people think of adding these? I know I'm not the only one who has wanted split or split' on more than one occasion, and they are not the funnest functions to rewrite every time you want them.
(About all they're missing are Haddocks; and perhaps a better name for split' which reflects how it is lossy and can't be undone while split can be.)
------
On a secondary note, but less important than the foregoing, I'd like to add two functions: 'replace' and 'replaceBy'. They do basically what they sound like: given two items, change every occurrence in a given list of one item to another. These are two other functions I often have to redefine, which still surprises me - Data.List has a surfeit of obscure functions I've never used and which are kind of odd, but a basic search-and-replace function isn't there? I mean, I'm not saying let's add enough functions to Data.List to turn it into a mini-Perl, but it strikes me as a real gap. (As before, I've defined some sensible QC properties and checked, although the definitions look obviously right to me.) Code:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> > replaceBy :: (a -> Bool) -> a -> [a] -> [a] > replaceBy a b = map (\x -> if a x then b else x) >
> replace :: (Eq a) => a -> a -> [a] -> [a] > replace a = replaceBy (==a) >
> replaceLengthProp :: (Eq a) => a -> a -> [a] -> Bool > replaceLengthProp x y z = (length $ replace x y z) == (length z) > replaceUndoableProp :: (Eq a) => a -> a -> [a] -> Bool > replaceUndoableProp x y z = if not (y `elem` z) then z == (replace y x $ replace > > x y z) else True > replaceIdempotentProp :: (Eq a) => a -> a -> [a] -> Bool > replaceIdempotentProp x y z = (replace x y $ replace x y z) == (replace x y z) > >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
--
gwern
PRF fritz Lon News IG Keyhole advise VFCT SITOR MDA
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://www.haskell.org/pipermail/libraries/attachments/20080710/c1d8815a/attachment.bin
More information about the Libraries
mailing list