[Haskell-cafe] Splitting a string into chunks

Jared Updike jupdike at gmail.com
Fri Jan 13 16:32:46 EST 2006


That works except it loses single newline characters.

let s = "1234\n5678\n\nabcdefghijklmnopq\n\n,,.,.,."
Prelude> blocks s
["12345678","abcdefghijklmnopq",",,.,.,."]

  Jared.

On 1/13/06, Sebastian Sylvan <sebastian.sylvan at gmail.com> wrote:
> On 1/13/06, Sebastian Sylvan <sebastian.sylvan at gmail.com> wrote:
> > On 1/13/06, Adam Turoff <adam.turoff at gmail.com> wrote:
> > > Hi,
> > >
> > > I'm trying to split a string into a list of substrings, where substrings
> > > are delimited by blank lines.
> > >
> > > This feels like it *should* be a primitive operation, but I can't seem
> > > to find one that works.  It's neither a fold nor a partition, since each
> > > chunk is separated by a 2-character sequence.  It's also not a grouping
> > > operation, since ghc's Data.List.groupBy examines the first element in a
> > > sequence with each candidate member of the same sequence, as
> > > demonstrated by:
> > >
> > >     Prelude> :module + Data.List
> > >     Prelude Data.List> let t = "asdfjkl;"
> > >     Prelude Data.List> groupBy (\a _ -> a == 's') t
> > >     ["a","sdfjkl;"]
> > >
> > > As a result, I've wound up with this:
> > >
> > >     -- Convert a file into blocks separated by blank lines (two
> > >     -- consecutive \n characters.) NB: Requires UNIX linefeeds
> > >
> > >     blocks :: String -> [String]
> > >     blocks s = f "" s
> > >       where
> > >         f "" [] = []
> > >         f s [] = [s]
> > >         f s ('\n':'\n':rest) = (s:f "" rest)
> > >         f s (a:rest) = f (s ++ [a]) rest
> > >
> > > Which somehow feels ugly.  This feels like it should be a fold, a group
> > > or something, where the test is something like:
> > >
> > >     (\a b -> (a /= '\n') && (b /= '\n'))
> >
> > Off the top of my head:
> >
> > blocks = map concat . groupBy (const null) . lines
> >
> > The lines function splits it into lines, the groupBy will group the
> > list into lists of lists and split when the sedond of two adjacent
> > elements is null (which is what an empty line passed to lines will
> > give you) and then a concat on each of the elements of this list will
> > "undo" the redundant lines-splitting that lines performed...
> >
>
> Sorry, I got the meaning of groupBy mixed up, it should be
>
> blocks = map concat . groupBy (const (not . null)) . lines
>
> /S
>
> --
> Sebastian Sylvan
> +46(0)736-818655
> UIN: 44640862
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe
>


--
jupdike at gmail.com
http://www.updike.org/~jared/
reverse ")-:"


More information about the Haskell-Cafe mailing list