[Haskell-beginners] wrapping text in a multiline string
rico.moorman at gmail.com
Wed Jun 6 08:52:44 CEST 2012
Thank you very much for this suggestion. I just tried the character class
you mentioned and it works.
The stackoverflow post you mentioned was a nice read and I surely agree
that regular expressions are normally not the way to go for most HTML
munging needs. But luckily the generated HTML from pandoc is very specific
and the <table> tag I wanted to match (for line-numbered code listings)
does not contain any further tables so I thought it should be safe to
approach it like this.
The resulting code is now:
-- Wraps numbered code listings within the page body with a div
-- in order to be able to apply some more specific styling.
wrapNumberedCodelistings (Page meta body) =
Page meta newBody
newBody = regexReplace
"<table\\s+class=\"sourceCode[^>]+>[\\s\\S]*?</table>" wrap body
wrap x = "<div class=\"sourceCodeWrap\">" ++ x ++ "</div>"
-- Replaces the whole match for the given regex using the given function
regexReplace :: String -> (String -> String) -> String -> String
regexReplace regex replace text = go text
go text = case text =~~ regex of
Just (before, match, after) ->
before ++ replace match ++ go after
_ -> text
Don't know though if it could be cleaned up further or even if this is by
any means good style (being still fairly new to haskell).
Furthermore I would still be very interested in the right approach to
manipulating the HTML structure as a whole and I too hope that another
Haskeller could name a more suitable solution for manipulating HTML.
Or even how to pass the 's' modifier to Text.Regex.PCRE.
On Wed, Jun 6, 2012 at 7:11 AM, Arlen Cuss <a at unnali.com> wrote:
> I'd be more inclined to look at a solution involving manipulating the HTML
> structure, rather than trying a regexp-based approach, which will probably
> end up disappointing. (See this: http://stackoverflow.com/a/1732454/499609
> I hope another Haskeller can speak to a library that would be good for
> this kind of purpose.
> To suit what you're doing now, though; if you change .*? to [\s\S]*?, it
> should work on multiline strings. If you can work out how to pass the 's'
> modifier to Text.Regexp.PCRE, that should also do it.
> On Wednesday, 6 June 2012 at 3:05 PM, Rico Moorman wrote:
> > Hello,
> > I have a given piece of multiline HTML (which is generated using pandoc
> btw.) and I am trying to wrap certain elements (tags with a given class)
> with a <div>.
> > I already took a look at the Text.Regex.PCRE module which seemed a
> reasonable choice because I am already familiar with similar regex
> implementations in other languages.
> > I came up with the following function which takes a regex and replaces
> all matches within the given string using the provided function (which I
> would use to wrap the element)
> > import Text.Regex.PCRE ((=~~))
> > -- Replaces the whole match for the given regex using the given function
> > regexReplace :: String -> (String -> String) -> String -> String
> > regexReplace regex replace text = go text
> > where
> > go text = case text =~~ regex of
> > Just (before, match, after) ->
> > before ++ replace match ++ go after
> > _ -> text
> > The problem with this function is, that it will not work on multiline
> strings. I would like to call it like this:
> > newBody = regexReplace "<table class=\"sourceCode\".*?table>" wrap body
> > wrap x = "<div class=\"sourceCodeWrap\">" ++ x ++ "</div>"
> > Is there any way to easily pass some kind of multiline modifier to the
> regex in question?
> > Or is this approach completely off and would something else be more
> appropriate/haskelly for the problem at hand?
> > Thank you very much in advance.
> > _______________________________________________
> > Beginners mailing list
> > Beginners at haskell.org (mailto:Beginners at haskell.org)
> > http://www.haskell.org/mailman/listinfo/beginners
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beginners