[Haskell-cafe] Problems with attoparsec (or maybe URI.ByteString)

Wed Feb 24 19:17:12 UTC 2016

It's an issue with uri-bytestring, it's being overly strict/annoying.

Cf. 
https://github.com/Soostone/uri-bytestring/blob/master/src/URI/ByteString/Internal.hs#L207-L209

Particularly: endOfInput

Otherwise you could many1/some/try that bugger to do what you want.

On 02/24/2016 01:11 PM, Brian Hurt wrote:
>
> What I'm trying to do is write a function with the signature:
>
>     data Urls =
>         Txt Text.Text
>         | Url Text.Text URI.URI
>         deriving (Show)
>
>     parseUrls :: Text.Text -> Either String [ Urls ]
>     parseUrls text = ...
>
> Given a text block, it finds all the URLs, and breaks things into 
> either URLs, or blocks of text which are not URLs.  The full text is 
> attached, for those who are interested.  But the problem I'm hitting 
> is using the Attoparsec parser URI.ByteString exports.  When I do:
>
> *Base.DetectURL AP> AP.parseOnly (URI.uriParser 
> URI.laxURIParserOptions) "http://foo/bar"
> Right (...)
>
> So, that works.  But when I add a single space on the end of the string:
>
> *Base.DetectURL AP> AP.parseOnly (URI.uriParser 
> URI.laxURIParserOptions) "http://foo/bar "
> Left "Failed reading: MalformedPath"
>
> It fails.  Note that this isn't a problem with parseOnly- the real 
> code looks like:
>
>     parseAllUris :: AP.Parser (Bldr.Builder, [ Urls ])
>     parseAllUris = msum [ aUri, noUri, finished ]
>         where
>             finished = return (mempty, [])
>             aUri = do
>                 (txt, url) <- AP.match $
>             URI.uriParser URI.laxURIParserOptions
>                 (bldr, us) <- msum [ noUri, finished ]
>                 return $ (mempty, (Url (E.decodeUtf8 txt) url
>                 : prependText bldr us))
>             noUri = do
>                 c <- AP.anyChar
>                 (bldr, us) <- parseAllUris
>                 return $ ((Bldr.charUtf8 c) `mappend` bldr, us)
>
> And this has the problem as well- parsing a URL with anything 
> following it fails, and it doesn't detect any URLs. The parseOnly is 
> just the easy way to demonstrate it.
>
> So, my question is, is there some way in attoparsec to tell it to just 
> parse as much as makes sense, and leave the rest?  Alternatively, is 
> this a problem with the way URI.ByteString module constructed it's 
> parser, and a different parser could work?  Or, worst of all, is this 
> a problem with the way that URIs are defined and no conforming parser 
> will work?
>
> Thanks.
>
> Brian
>
>
>
> _______________________________________________
> Haskell-Cafe mailing list
> Haskell-Cafe at haskell.org
> http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160224/780a5c68/attachment.html>