<html>

  <head>

    <meta content="text/html; charset=windows-1252"

      http-equiv="Content-Type">

  </head>

  <body text="#000000" bgcolor="#FFFFFF">

    It's an issue with uri-bytestring, it's being overly

    strict/annoying.<br>

    <br>

    Cf.

<a class="moz-txt-link-freetext" href="https://github.com/Soostone/uri-bytestring/blob/master/src/URI/ByteString/Internal.hs#L207-L209">https://github.com/Soostone/uri-bytestring/blob/master/src/URI/ByteString/Internal.hs#L207-L209</a><br>

    <br>

    Particularly: endOfInput<br>

    <br>

    Otherwise you could many1/some/try that bugger to do what you want.<br>

    <br>

    <div class="moz-cite-prefix">On 02/24/2016 01:11 PM, Brian Hurt

      wrote:<br>

    </div>

    <blockquote

cite="mid:CAMSTv-O4SJhzijXZXfqOQ=dBcENoF1Zus9v5as=_EiMZTkUieQ@mail.gmail.com"

      type="cite">

      <div dir="ltr"><br>

        What I'm trying to do is write a function with the signature:<br>

        <br>

        <font face="monospace, monospace">    data Urls =<br>

                  Txt Text.Text<br>

                  | Url Text.Text URI.URI<br>

                  deriving (Show)<br>

          <br>

              parseUrls :: Text.Text -> Either String [ Urls ]<br>

              parseUrls text = ...</font><br>

        <br>

        Given a text block, it finds all the URLs, and breaks things

        into either URLs, or blocks of text which are not URLs.  The

        full text is attached, for those who are interested.  But the

        problem I'm hitting is using the Attoparsec parser

        URI.ByteString exports.  When I do:<br>

        <br>

        <font face="monospace, monospace">*Base.DetectURL AP>

          AP.parseOnly (URI.uriParser URI.laxURIParserOptions) "<a

            moz-do-not-send="true" href="http://foo/bar"><a class="moz-txt-link-freetext" href="http://foo/bar">http://foo/bar</a></a>"<br>

          Right (...)<br>

        </font><br>

        So, that works.  But when I add a single space on the end of the

        string:<br>

        <br>

        <font face="monospace, monospace">*Base.DetectURL AP>

          AP.parseOnly (URI.uriParser URI.laxURIParserOptions) "<a

            moz-do-not-send="true" href="http://foo/bar"><a class="moz-txt-link-freetext" href="http://foo/bar">http://foo/bar</a></a>

          "<br>

          Left "Failed reading: MalformedPath"<br>

        </font>

        <div><br>

        </div>

        <div>It fails.  Note that this isn't a problem with parseOnly-

          the real code looks like:</div>

        <div><br>

        </div>

        <div>

          <div><font face="monospace, monospace">    parseAllUris ::

              AP.Parser (Bldr.Builder, [ Urls ])</font></div>

          <div><font face="monospace, monospace">    parseAllUris = msum

              [ aUri, noUri, finished ]</font></div>

          <div><font face="monospace, monospace">        where</font></div>

          <div><font face="monospace, monospace">            finished =

              return (mempty, [])</font></div>

          <div><font face="monospace, monospace">            aUri = do</font></div>

          <div><font face="monospace, monospace">                (txt,

              url) <- AP.match $</font></div>

          <div><font face="monospace, monospace">                       

                          URI.uriParser URI.laxURIParserOptions</font></div>

          <div><font face="monospace, monospace">                (bldr,

              us) <- msum [ noUri, finished ]</font></div>

          <div><font face="monospace, monospace">                return

              $ (mempty, (Url (E.decodeUtf8 txt) url</font></div>

          <div><font face="monospace, monospace">                       

                              : prependText bldr us))</font></div>

          <div><font face="monospace, monospace">            noUri = do</font></div>

          <div><font face="monospace, monospace">                c <-

              AP.anyChar</font></div>

          <div><font face="monospace, monospace">                (bldr,

              us) <- parseAllUris</font></div>

          <div><font face="monospace, monospace">                return

              $ ((Bldr.charUtf8 c) `mappend` bldr, us)</font></div>

          <div><br>

          </div>

        </div>

        <div>And this has the problem as well- parsing a URL with

          anything following it fails, and it doesn't detect any URLs. 

          The parseOnly is just the easy way to demonstrate it.</div>

        <div><br>

        </div>

        <div>

          <div>So, my question is, is there some way in attoparsec to

            tell it to just parse as much as makes sense, and leave the

            rest?  Alternatively, is this a problem with the way

            URI.ByteString module constructed it's parser, and a

            different parser could work?  Or, worst of all, is this a

            problem with the way that URIs are defined and no conforming

            parser will work?</div>

          <div><br>

          </div>

        </div>

        <div>Thanks.</div>

        <div><br>

        </div>

        <div>Brian</div>

        <div><br>

        </div>

      </div>

      <br>

      <fieldset class="mimeAttachmentHeader"></fieldset>

      <br>

      <pre wrap="">_______________________________________________

Haskell-Cafe mailing list

<a class="moz-txt-link-abbreviated" href="mailto:Haskell-Cafe@haskell.org">Haskell-Cafe@haskell.org</a>

<a class="moz-txt-link-freetext" href="http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe">http://mail.haskell.org/cgi-bin/mailman/listinfo/haskell-cafe</a>

</pre>

    </blockquote>

    <br>

  </body>

</html>