[Haskell-cafe] How to split this string.
Steve Horne
sh006d3592 at blueyonder.co.uk
Mon Jan 2 11:36:26 CET 2012
On 02/01/2012 09:44, max wrote:
> I want to write a function whose behavior is as follows:
>
> foo "string1\nstring2\r\nstring3\nstring4" = ["string1",
> "string2\r\nstring3", "string4"]
>
> Note the sequence "\r\n", which is ignored. How can I do this?
Doing it probably the hard way (and getting it wrong) looks like the
following...
-- Function to accept (normally) a single character. Special-cases
-- \r\n. Refuses to accept \n. Result is either an empty list, or
-- an (accepted, remaining) pair.
parseTok :: String -> [(String, String)]
parseTok "" = []
parseTok (c1:c2:cs) | ((c1 == '\r') && (c2 == '\n')) = [(c1:c2:[], cs)]
parseTok (c:cs) | (c /= '\n') = [(c:[], cs)]
| True = []
-- Accept a sequence of those (mostly single) characters
parseItem :: String -> [(String, String)]
parseItem "" = [("","")]
parseItem cs = [(j1s ++ j2s, k2s)
| (j1s,k1s) <- parseTok cs
, (j2s,k2s) <- parseItem k1s
]
-- Accept a whole list of strings
parseAll :: String -> [([String], String)]
parseAll [] = [([],"")]
parseAll cs = [(j1s:j2s,k2s)
| (j1s,k1s) <- parseItem cs
, (j2s,k2s) <- parseAll k1s
]
-- Get the first valid result, which should have consumed the
-- whole string but this isn't checked. No check for existence either.
parse :: String -> [String]
parse cs = fst (head (parseAll cs))
I got it wrong in that this never consumes the \n between items, so
it'll all go horribly wrong. There's a good chance there's a typo or two
as well. The basic idea should be clear, though - maybe I should fix it
but I've got some other things to do at the moment. Think of the \n as a
separator, or as a prefix to every "item" but the first. Alternatively,
treat it as a prefix to *every* item, and artificially add an initial
one to the string in the top-level parse function. The use tail etc to
remove that from the first item.
See http://channel9.msdn.com/Tags/haskell - there's a series of 13
videos by Dr. Erik Meijer. The eighth in the series covers this basic
technique - it calls them monadic and uses the do notation and that
confused me slightly at first, it's the *list* type which is monadic in
this case and (as you can see) I prefer to use list comprehensions
rather than do notation.
There may be a simpler way, though - there's still a fair bit of Haskell
and its ecosystem I need to figure out. There's a tool called alex, for
instance, but I've not used it.
More information about the Haskell-Cafe
mailing list