[Haskell-cafe] TagSoup 0.9

Neil Mitchell ndmitchell at gmail.com
Tue May 25 18:51:18 EDT 2010


Hi,

>From what I can tell of your example you've managed to get the raw
HTTP response in Unicode, which isn't suitable for sending to tagsoup.
I've not used the Network.HTTP library for downloading much, but when
I did I thought it stripped the headers automatically.

Can you just print the first few lines of the output you get from the
HTTP library, without passing them through tagsoup. That should show
the problem independent of tagsoup.

Thanks, Neil


On Mon, May 24, 2010 at 3:24 AM, Ralph Hodgson <rhodgson at topquadrant.com> wrote:
> Thanks Neil,
>
>
>
> Using Network.HTTP worked.
>
>
>
> However something else I have just run into concerns some web pages that
> start with:
>
>
>
> <?xml version="1.0" encoding="iso-8859-1"?>
>
> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
> "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
>
>
>
> I get the following bad result:
>
>
>
> TagText "HTTP/1.1 200 OK\r\nContent-Type: text/html\r\nLast-Modified: Tue,
> 27 Oct 2009 19:30:40 GMT\r\nETag: \"6f248cf73b57ca1:25e2\"\r\nDate: Sun, 23
> May 2010 22:46:41 GMT\r\nTransfer-Encoding:  chunked\r\nConnection:
> close\r\nConnection:
> Transfer-Encoding\r\n\r\n00004000\r\n\255\254<\NUL?\NULx\NULm\NULl\NUL
> \NULv\NULe\NULr\NULs\NULi\NULo\NULn\NUL=\NUL\"\NUL1\NUL.\NUL0\NUL\"\NUL
> \NULe\NULn\NULc\NULo\NULd\NULi\NULn\NULg\NUL=\NUL\"\NULi\NULs\NULo\NUL-\NUL8\NUL8\NUL5\NUL9\NUL-\NUL1\NUL\"\NUL
>
>
>
> etc etc
>
>
>
> Is this an easy thing to fix? I've started to look over the code.
>
>
>
> -----Original Message-----
> From: Neil Mitchell [mailto:ndmitchell at gmail.com]
> Sent: Wednesday, May 19, 2010 12:19 PM
> To: Ralph Hodgson
> Cc: Daniel Fischer; haskell-cafe at haskell.org; Don Stewart
> Subject: Re: [Haskell-cafe] TagSoup 0.9
>
>
>
> Hi Ralph,
>
>
>
>> I was using TagSoup 0.8 with great success. On upgrading to 0.9 I have
>> this error:
>
>>
>
>> TQ\TagSoup\TagSoupExtensions.lhs:29:17:
>
>>    `Tag' is not applied to enough type arguments
>
>>    Expected kind `*', but `Tag' has kind `* -> *'
>
>>    In the type synonym declaration for `Bundle'
>
>> Failed, modules loaded: TQ.Common.TextAndListHandling.
>
>
>
> My change notes have this being a change between 0.6 and 0.8. As
>
> Malcolm says, any old uses of "Tag" should become "Tag String". The
>
> reason is that Tag is now parameterised, and you can use Tag
>
> ByteString etc. However, I should point out that Tag ByteString won't
>
> be any faster than Tag String in this version (it's in the future work
>
> pile).
>
>
>
>>> > Forgot to add: I now need to understand the following warnings on this
>
>>> > line "> import Text.HTML.Download":
>
>
>
> Everyone's comments have been right. I previously included
>
> Text.HTML.Download so that it was easy to test tagsoup against the
>
> web. Since I first wrote that snippet the HTTP downloading libraries
>
> have improved substantially, so people should use those in favour of
>
> the version in tagsoup - you'll be able to connect to more websites in
>
> more reliable ways, go through proxies etc. I don't intend to remove
>
> the Download module any time soon, but I will do eventually.
>
>
>
> Thanks, Neil


More information about the Haskell-Cafe mailing list