[Haskell-cafe] Parsec line number off-by-one

Roman Cheplyaka roma at ro-che.info
Wed Sep 21 08:37:40 CEST 2011


Hi Ben,

This is indeed a bug in parsec.

I have written a patch that fixes this. Currently Antoine Latter (current
parsec's maintainer) and I are working on getting these patches into the
next parsec release.

As a workaround until then, you can apply the attached patch manually.

  darcs get http://code.haskell.org/parsec3
  cd parsec3
  darcs apply parsec.dpatch
  cabal install

With this patch, the error message is:

  Left "(unknown)" (line 18, column 1):
  expecting space or atom name


* Ben Gamari <bgamari.foss at gmail.com> [2011-09-20 23:32:34-0400]
> Recently I've been playing around with Parsec for a simple parsing
> project. While I was able to quickly construct my grammar (simplified
> version attached), getting it working has been a bit tricky. In
> particular, I am now stuck trying to figure out why Parsec is
> mis-reporting line numbers. Parsec seems convinced that line 12 of my
> input (also attached) has a "%" character,
> 
>   $ runghc Test.hs
>   Left "(unknown)" (line 12, column 1):
>   unexpected "%"
>   expecting space or atom name
> 
> while my file clearly disagrees,
> 
>   10  %FLAG ATOM_NAME                                                                 
>   11  %FORMAT(20a4)                                                                   
>   12  C1  H1  C2  H2  C3  H3  C4  H4  C5  C6  C7  C8  N1  C9  H9  C10 H10 C11 H11 C12 
>   13  H12 C13 H13 C14 C15 N2  C16 C17 C29 H18 C19 H19 C20 H20 C21 H21 C22 H221H222H223
>   ...
>   18  %FLAG CHARGE
>   19  %FORMAT(5E16.8)                                                                 
> 
> The task here is to identify the block of data lines (lines 12-17),
> ending at the beginning of the next block (starting with "%"). It seems
> likely that my problem stems from the fact that I use "try" to
> accomplish this but this is as far as I can reason.
> 
> Any ideas what might cause this sort of off-by-one? Does anyone see a
> better (i.e. working) way to formulate my grammar? Any and all help
> would be greatly appreciated. Thanks.
> 
> Cheers,
> 
> - Ben
> 
> 



> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe


-- 
Roman I. Cheplyaka :: http://ro-che.info/
-------------- next part --------------
4 patches for repository http://code.haskell.org/parsec3:

Sun Feb 20 18:24:22 EET 2011  Roman Cheplyaka <roma at ro-che.info>
  * Choose the longest match when merging error messages

Sun Feb 20 18:24:49 EET 2011  Roman Cheplyaka <roma at ro-che.info>
  * try: do not reset the error position

Sun Feb 20 18:29:20 EET 2011  Roman Cheplyaka <roma at ro-che.info>
  * lookAhead: do not consume input on success; update documentation

Sun Feb 20 19:30:26 EET 2011  Roman Cheplyaka <roma at ro-che.info>
  * Improve <?>

New patches:

[Choose the longest match when merging error messages
Roman Cheplyaka <roma at ro-che.info>**20110220162422
 Ignore-this: 54e2733159a1574abb229e09ff6935c1
] hunk ./Text/Parsec/Error.hs 137
     = ParseError pos (msg : filter (msg /=) msgs)
 
 mergeError :: ParseError -> ParseError -> ParseError
-mergeError (ParseError pos msgs1) (ParseError _ msgs2)
-    = ParseError pos (msgs1 ++ msgs2)
+mergeError (ParseError pos1 msgs1) (ParseError pos2 msgs2)
+    = case pos1 `compare` pos2 of
+        -- select the longest match
+        EQ -> ParseError pos1 (msgs1 ++ msgs2)
+        GT -> ParseError pos1 msgs1
+        LT -> ParseError pos2 msgs2
 
 instance Show ParseError where
     show err
[try: do not reset the error position
Roman Cheplyaka <roma at ro-che.info>**20110220162449
 Ignore-this: 8508bc41fc6dcd9b7c06aac762f12c71
] hunk ./Text/Parsec/Prim.hs 435
 
 try :: ParsecT s u m a -> ParsecT s u m a
 try p =
-    ParsecT $ \s@(State _ pos _) cok _ eok eerr ->
-    let pcerr parseError = eerr $ setErrorPos pos parseError 
-    in unParser p s cok pcerr eok eerr
+    ParsecT $ \s cok _ eok eerr ->
+    unParser p s cok eerr eok eerr
 
 -- | The parser @tokenPrim showTok posFromTok testTok@ accepts a token @t@
 -- with result @x@ when the function @testTok t@ returns @'Just' x at . The
[lookAhead: do not consume input on success; update documentation
Roman Cheplyaka <roma at ro-che.info>**20110220162920
 Ignore-this: e884771490209b93e9fec044543a18ef
] {
hunk ./Text/Parsec/Combinator.hs 279
                             <|>
                               do{ x <- p; xs <- scan; return (x:xs) }
 
--- | @lookAhead p@ parses @p@ without consuming any input.
-
-lookAhead :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m a
-lookAhead p         = do{ state <- getParserState
-                        ; x <- p
-                        ; setParserState state
-                        ; return x
-                        }
-
hunk ./Text/Parsec/Prim.hs 40
     , (<|>)
     , label
     , labels
+    , lookAhead
     , Stream(..)
     , tokens
     , try
hunk ./Text/Parsec/Prim.hs 439
     ParsecT $ \s cok _ eok eerr ->
     unParser p s cok eerr eok eerr
 
+-- | @lookAhead p@ parses @p@ without consuming any input.
+--
+-- If @p@ fails and consumes some input, so does @lookAhead at . Combine with 'try'
+-- if this is undesirable.
+
+lookAhead :: (Stream s m t) => ParsecT s u m a -> ParsecT s u m a
+lookAhead p         = do{ state <- getParserState
+                        ; x <- p'
+                        ; setParserState state
+                        ; return x
+                        }
+    where
+    p' = ParsecT $ \s cok cerr eok eerr ->
+         unParser p s eok cerr eok eerr
+
 -- | The parser @tokenPrim showTok posFromTok testTok@ accepts a token @t@
 -- with result @x@ when the function @testTok t@ returns @'Just' x at . The
 -- source position of the @t@ should be returned by @posFromTok t@ and
}
[Improve <?>
Roman Cheplyaka <roma at ro-che.info>**20110220173026
 Ignore-this: b147dad8729f0a8f22b7149559d6b243
] hunk ./Text/Parsec/Prim.hs 315
 -- letter', which is less friendly.
 
 (<?>) :: (ParsecT s u m a) -> String -> (ParsecT s u m a)
-p <?> msg = label p msg
+p <?> msg = ParsecT $ \s cok cerr eok eerr ->
+    let parseError = newErrorMessage (Expect msg) (statePos s)
+        eok' x s' _ = eok x s' parseError
+        eerr' _ = eerr parseError
+    in unParser p s cok cerr eok' eerr'
 
 -- | This combinator implements choice. The parser @p \<|> q@ first
 -- applies @p at . If it succeeds, the value of @p@ is returned. If @p@

Context:

[TAG 3.1.1
Antoine Latter <aslatter at gmail.com>**20110129160030
 Ignore-this: 42ddc9e7316d68945c2c1260c2acd403
] 
Patch bundle hash:
52077d12002c2820c33c39701e01ead8de62c61e


More information about the Haskell-Cafe mailing list