[Haskell-beginners] wrapping text in a multiline string

Arlen Cuss a at unnali.com
Wed Jun 6 13:25:44 CEST 2012


Since I can't seem to get the PCRE package installed (not sure what's going on here: http://hpaste.org/69589), I gave it a go with plain Text.Regex, which should (?) use Text.Regex.Posix by default.

λ> let regex = mkRegexWithOpts "abc(.*)def" False False in matchRegex regex "abc content\nmore def"
Just [" content\nmore "]
λ>  


See: http://hackage.haskell.org/packages/archive/regex-compat/0.92/doc/html/Text-Regex.html#v:mkRegexWithOpts

That said, Text.Regex.Posix is a bit broken, and you might want to use regex-tdfa. See also:

* http://www.haskell.org/haskellwiki/Regex_Posix
* http://www.haskell.org/haskellwiki/Regular_expressions

Cheers,

A  


On Wednesday, 6 June 2012 at 6:39 PM, Rico Moorman wrote:

> Thank you again!
>  
> Looking at the docs it seems that this could do the trick. A pity that
> you cannot install the package.
>  
> Now I am wondering how I would integrate this in the replacement
> function or how to rewrite it properly.
>  
> Looking at the type signature of =~~ (with my limited knowledge) it
> seems that I would have to "use" RegexMaker adding up the CompOptions
> needed?
>  
> (=~~) :: (RegexMaker Regex CompOption ExecOption source, RegexContext
> Regex source1 target, Monad m) => source1 -> source -> m target
>  
>  
> On Wed, Jun 6, 2012 at 9:46 AM, Arlen Cuss <a at unnali.com (mailto:a at unnali.com)> wrote:
> >  
> > Exploring the documentation for Text.Regex.PCRE, I've found "CompOption":
> >  
> > http://hackage.haskell.org/packages/archive/regex-pcre/0.94.4/doc/html/Text-Regex-PCRE-Wrap.html#t:CompOption
> >  
> > The constants are listed below; the one you want is probably compDotAll, to make "." match newlines as well. I'm not 100% sure if this is the module you want, though, and I can't seem to get regex-pcre installed, so I can't test. Apologies!
> >  
> >  
> > On Wednesday, 6 June 2012 at 4:52 PM, Rico Moorman wrote:
> >  
> > > Thank you very much for this suggestion. I just tried the character class you mentioned and it works.
> > >  
> > > The stackoverflow post you mentioned was a nice read and I surely agree that regular expressions are normally not the way to go for most HTML munging needs. But luckily the generated HTML from pandoc is very specific and the <table> tag I wanted to match (for line-numbered code listings) does not contain any further tables so I thought it should be safe to approach it like this.
> > >  
> > > The resulting code is now:
> > >  
> > > -- Wraps numbered code listings within the page body with a div
> > > -- in order to be able to apply some more specific styling.
> > > wrapNumberedCodelistings (Page meta body) =
> > > Page meta newBody
> > > where
> > > newBody = regexReplace "<table\\s+class=\"sourceCode[^>]+>[\\s\\S]*?</table>" wrap body
> > > wrap x = "<div class=\"sourceCodeWrap\">" ++ x ++ "</div>"
> > >  
> > > -- Replaces the whole match for the given regex using the given function
> > > regexReplace :: String -> (String -> String) -> String -> String
> > > regexReplace regex replace text = go text
> > > where
> > > go text = case text =~~ regex of
> > > Just (before, match, after) ->
> > > before ++ replace match ++ go after
> > > _ -> text
> > >  
> > >  
> > > Don't know though if it could be cleaned up further or even if this is by any means good style (being still fairly new to haskell).
> > >  
> > > Furthermore I would still be very interested in the right approach to manipulating the HTML structure as a whole and I too hope that another Haskeller could name a more suitable solution for manipulating HTML.
> > > Or even how to pass the 's' modifier to Text.Regex.PCRE.
> > >  
> > > Best regards,
> > >  
> > > rico
> > >  
> > > On Wed, Jun 6, 2012 at 7:11 AM, Arlen Cuss <a at unnali.com (mailto:a at unnali.com)> wrote:
> > > > I'd be more inclined to look at a solution involving manipulating the HTML structure, rather than trying a regexp-based approach, which will probably end up disappointing. (See this: http://stackoverflow.com/a/1732454/499609)
> > > >  
> > > > I hope another Haskeller can speak to a library that would be good for this kind of purpose.
> > > >  
> > > > To suit what you're doing now, though; if you change .*? to [\s\S]*?, it should work on multiline strings. If you can work out how to pass the 's' modifier to Text.Regexp.PCRE, that should also do it.
> > > >  
> > > > —Arlen
> > > >  
> > > >  
> > > > On Wednesday, 6 June 2012 at 3:05 PM, Rico Moorman wrote:
> > > >  
> > > > > Hello,
> > > > >  
> > > > > I have a given piece of multiline HTML (which is generated using pandoc btw.) and I am trying to wrap certain elements (tags with a given class) with a <div>.
> > > > >  
> > > > > I already took a look at the Text.Regex.PCRE module which seemed a reasonable choice because I am already familiar with similar regex implementations in other languages.
> > > > >  
> > > > > I came up with the following function which takes a regex and replaces all matches within the given string using the provided function (which I would use to wrap the element)
> > > > >  
> > > > > import Text.Regex.PCRE ((=~~))
> > > > >  
> > > > > -- Replaces the whole match for the given regex using the given function
> > > > > regexReplace :: String -> (String -> String) -> String -> String
> > > > > regexReplace regex replace text = go text
> > > > > where
> > > > > go text = case text =~~ regex of
> > > > > Just (before, match, after) ->
> > > > > before ++ replace match ++ go after
> > > > > _ -> text
> > > > >  
> > > > > The problem with this function is, that it will not work on multiline strings. I would like to call it like this:
> > > > >  
> > > > > newBody = regexReplace "<table class=\"sourceCode\".*?table>" wrap body
> > > > > wrap x = "<div class=\"sourceCodeWrap\">" ++ x ++ "</div>"
> > > > >  
> > > > > Is there any way to easily pass some kind of multiline modifier to the regex in question?
> > > > >  
> > > > > Or is this approach completely off and would something else be more appropriate/haskelly for the problem at hand?
> > > > >  
> > > > > Thank you very much in advance.
> > > > > _______________________________________________
> > > > > Beginners mailing list
> > > > > Beginners at haskell.org (mailto:Beginners at haskell.org) (mailto:Beginners at haskell.org)
> > > > > http://www.haskell.org/mailman/listinfo/beginners
> > > >  
> > >  
> >  
>  






More information about the Beginners mailing list