[Haskell-cafe] Splitting a string into chunks

Sebastian Sylvan sebastian.sylvan at gmail.com
Fri Jan 13 16:22:47 EST 2006


On 1/13/06, Adam Turoff <adam.turoff at gmail.com> wrote:
> Hi,
>
> I'm trying to split a string into a list of substrings, where substrings
> are delimited by blank lines.
>
> This feels like it *should* be a primitive operation, but I can't seem
> to find one that works.  It's neither a fold nor a partition, since each
> chunk is separated by a 2-character sequence.  It's also not a grouping
> operation, since ghc's Data.List.groupBy examines the first element in a
> sequence with each candidate member of the same sequence, as
> demonstrated by:
>
>     Prelude> :module + Data.List
>     Prelude Data.List> let t = "asdfjkl;"
>     Prelude Data.List> groupBy (\a _ -> a == 's') t
>     ["a","sdfjkl;"]
>
> As a result, I've wound up with this:
>
>     -- Convert a file into blocks separated by blank lines (two
>     -- consecutive \n characters.) NB: Requires UNIX linefeeds
>
>     blocks :: String -> [String]
>     blocks s = f "" s
>       where
>         f "" [] = []
>         f s [] = [s]
>         f s ('\n':'\n':rest) = (s:f "" rest)
>         f s (a:rest) = f (s ++ [a]) rest
>
> Which somehow feels ugly.  This feels like it should be a fold, a group
> or something, where the test is something like:
>
>     (\a b -> (a /= '\n') && (b /= '\n'))

Off the top of my head:

blocks = map concat . groupBy (const null) . lines

The lines function splits it into lines, the groupBy will group the
list into lists of lists and split when the sedond of two adjacent
elements is null (which is what an empty line passed to lines will
give you) and then a concat on each of the elements of this list will
"undo" the redundant lines-splitting that lines performed...

/S
--
Sebastian Sylvan
+46(0)736-818655
UIN: 44640862


More information about the Haskell-Cafe mailing list