[Haskell-cafe] Troubles understanding Parsec Error Handling

Roman Cheplyaka roma at ro-che.info
Thu May 31 00:47:08 CEST 2012


* Matthias Hörmann <mhoermann at gmail.com> [2012-05-30 21:36:13+0200]
> And my question about this is made up of two parts
> 
> 1. Why doesn't it print my "unexpected" message but instead says unknown
> parse error
> 2. Why is the location in the text off (I would expect it to fail at column
> 6 (first character beyond the result it could return) or 7 (first character
> that makes the string no prefix of any acceptable string)

Thanks for reporting. This is a regression introduced by me in this patch:

Sun Feb 20 18:24:22 EET 2011  Roman Cheplyaka <roma at ro-che.info>
  * Choose the longest match when merging error messages

The source of the regression is that parsec sometimes generates dummy (aka
"unknown") error messages when no actual error has occurred. In your
case the dummy error has a "bigger" position because it was generated
by anyChar inside lookAhead.

So, when merging errors, before simply looking at the positions we
should check if one of them is dummy and just ignore it. The patch is
attached.

With this patch your code prints:

    parse error at (line 1, column 7):
    unexpected "Hallofb", expecting one of ["Hello","Hallo","Foo","HallofFame"]

This is probably still somewhat confusing to a user of your code
(there's no "Hallofb" starting at column 7), but is correct from
Parsec's point of view, because you generated this message while looking
at the 7th character.

-- 
Roman I. Cheplyaka :: http://ro-che.info/
-------------- next part --------------
1 patch for repository http://code.haskell.org/parsec3:

Thu May 31 01:38:09 EEST 2012  Roman Cheplyaka <roma at ro-che.info>
  * When merging error messages, prefer known messages to unknown ones
  
  This fixes a regression introduced by:
  
  Sun Feb 20 18:24:22 EET 2011  Roman Cheplyaka <roma at ro-che.info>
    * Choose the longest match when merging error messages
  
  The source of the regression is that parsec sometimes generates dummy (aka
  "unknown") error messages when no actual error has occurred.
  
  So, when merging errors, before simply looking at the positions we should check
  if one of them is unknown and just ignore it.
  
  Reported by Matthias Hörmann.

New patches:

[When merging error messages, prefer known messages to unknown ones
Roman Cheplyaka <roma at ro-che.info>**20120530223809
 Ignore-this: 1cfcc0a8d1cbfd183a3897e79c320c22
 
 This fixes a regression introduced by:
 
 Sun Feb 20 18:24:22 EET 2011  Roman Cheplyaka <roma at ro-che.info>
   * Choose the longest match when merging error messages
 
 The source of the regression is that parsec sometimes generates dummy (aka
 "unknown") error messages when no actual error has occurred.
 
 So, when merging errors, before simply looking at the positions we should check
 if one of them is unknown and just ignore it.
 
 Reported by Matthias Hörmann.
] {
hunk ./Text/Parsec/Error.hs 137
     = ParseError pos (msg : filter (msg /=) msgs)
 
 mergeError :: ParseError -> ParseError -> ParseError
-mergeError (ParseError pos1 msgs1) (ParseError pos2 msgs2)
+mergeError e1@(ParseError pos1 msgs1) e2@(ParseError pos2 msgs2)
+    -- prefer meaningful errors
+    | null msgs2 && not (null msgs1) = e1
+    | null msgs1 && not (null msgs2) = e2
+    | otherwise
     = case pos1 `compare` pos2 of
         -- select the longest match
         EQ -> ParseError pos1 (msgs1 ++ msgs2)
hunk ./Text/Parsec/Error.hs 145
-        GT -> ParseError pos1 msgs1
-        LT -> ParseError pos2 msgs2
+        GT -> e1
+        LT -> e2
 
 instance Show ParseError where
     show err
}

Context:

[TAG 3.1.2
Antoine Latter <aslatter at gmail.com>**20111008182138
 Ignore-this: 96361fd74cad3d51b4213e0bcd91cdf3
] 
[version bump for release
Antoine Latter <aslatter at gmail.com>**20111008181844
 Ignore-this: 9c28994644744eaf375d9c5d75d2b201
] 
[add Stream Text instances
Antoine Latter <aslatter at gmail.com>**20111008181718
 Ignore-this: fcf1bc6a54bae9936669e28047c4f736
] 
[Fix reserved name recognition for case-insensitive languages.
Antoine Latter <aslatter at gmail.com>**20111008180454
 Ignore-this: aed4027b1f273913f7586208e5a6f82c
] 
[Documentation fix
Roman Cheplyaka <roma at ro-che.info>**20111228222953
 Ignore-this: 2d226ed7cde7a8322be04f5188957eb2
] 
[lookAhead: do not consume input on success; update documentation
Roman Cheplyaka <roma at ro-che.info>**20110220162920
 Ignore-this: e884771490209b93e9fec044543a18ef
] 
[try: do not reset the error position
Roman Cheplyaka <roma at ro-che.info>**20110220162449
 Ignore-this: 8508bc41fc6dcd9b7c06aac762f12c71
] 
[Choose the longest match when merging error messages
Roman Cheplyaka <roma at ro-che.info>**20110220162422
 Ignore-this: 54e2733159a1574abb229e09ff6935c1
] 
[TAG 3.1.1
Antoine Latter <aslatter at gmail.com>**20110129160030
 Ignore-this: 42ddc9e7316d68945c2c1260c2acd403
] 
Patch bundle hash:
09fc71cdc9e86dc672f19bc8fb939103cae782bb


More information about the Haskell-Cafe mailing list