[web-devel] [Newbie][Parsec] Skipping to desired phrase

David McBride dmcbride at neondsl.com
Fri Jul 29 14:58:06 CEST 2011

It skips a lot of characters, then when it gets to nicelinks, it skips
that, then continues to skip characters and nicelinks, then it hits
eof, and says hey I need either nicelinks or some more characters to
continue skipping or a string to capture but there aren't any, so it

p_rest = do
  manyTill anyChar (try (string "nicelinks")) <?> "fdsa"
  text <- many1 anyChar <?> "asdf"
  return [text]

This works, however I have a funny feeling you want the anyChar to be
something more complex than a single character, which is why you went
down this route.  I had the same problem and some fellow helped me on
stack overflow with a solution.  This is a case where you pretty much
have to use recursion to get what you want.

import Text.Parsec

html = "<head>nicelinks:123</head>"

p_rest = do
  string "nicelinks" <|> anyHeadString <?> "fdsa"
  p_rest <|> manyTill anyChar (try anyHeadString) <?> "asdf"

anyHeadString = try (string "<head>") <|> string "</head>"

main = do
  print $ parse p_rest [] html

On Fri, Jul 29, 2011 at 4:27 AM, Kamil Ciemniewski
<ciemniewski.kamil at gmail.com> wrote:
> Hi all,
> I've got a String containing html and I'd like
> to extract from it some informations..
> Specifically these informations start at point
> "after" some phrase ( let say "nicelinks").
> How do I skip all the html up to the point
> of this phrase?
> I've done that much already:
> p_rest = do
>   skipMany ((try (string "nicelinks")) <|> anyHeadString)
>   text <- many1 anyChar
>   return [text]
> anyHeadString = do
>   c <- anyChar
>   return [c]
> But after doing:
> parse p_rest [] html
> I get:
> Left (line 112, column 15):
> unexpected end of input
> expecting "nicelinks"
> What am I doing wrong?
> Best regards
> _______________________________________________
> web-devel mailing list
> web-devel at haskell.org
> http://www.haskell.org/mailman/listinfo/web-devel

More information about the web-devel mailing list