[Haskell-cafe] Confusing behavior in Haskell networking libraries

Alex Rozenshteyn rpglover64 at gmail.com
Thu Feb 18 23:22:43 UTC 2016


I was trying to write a web scraper, so I used scalpel
<https://hackage.haskell.org/package/scalpel>. The website I wanted to
scrape blocks my IP (I run a tor exit node), so I decided to use proxychains
<http://proxychains.sourceforge.net/> (specifically, version 3.1-6
according to Debian). I ran into the following weird behavior: if I tell
proxychains to run dns through the proxy, things are fine, but if I tell it
to run dns in the clear or the URL I'm trying to connect to is an IP
address (e.g. manually resolved), I always get timeouts (much faster than I
should).

(don't resolve dns over the proxy)
% proxychains stack exec -- test-scalpel "http://ifconfig.co"
ProxyChains-3.1 (http://proxychains.sf.net)
|R-chain|-<>-201.175.94.245:38746-<><>-188.113.88.193:80-<--timeout

(resolve dns over the proxy)
% proxychains stack exec -- test-scalpel "http://ifconfig.co"
ProxyChains-3.1 (http://proxychains.sf.net)
|DNS-request| ifconfig.co
|R-chain|-<>-201.175.111.245:10000-<><>-4.2.2.2:53-<><>-OK
|DNS-response| ifconfig.co is 188.113.88.193
|R-chain|-<>-201.175.111.245:10000-<><>-188.113.88.193:80-<><>-OK

(resolve dns over proxy, but use an IP to avoid actually doing it)
% proxychains stack exec -- test-scalpel "http://188.113.88.193"
ProxyChains-3.1 (http://proxychains.sf.net)
|R-chain|-<>-201.175.111.245:10000-<><>-188.113.88.193:80-<--timeout

curl does not have this behavior:

% proxychains stack exec -- curl "http://188.113.88.193"
ProxyChains-3.1 (http://proxychains.sf.net)
|R-chain|-<>-201.172.16.131:55599-<><>-188.113.88.193:80-<><>-OK

% proxychains stack exec -- curl "http://ifconfig.co"
ProxyChains-3.1 (http://proxychains.sf.net)
|DNS-request| ifconfig.co
|R-chain|-<>-201.172.17.231:10000-<><>-4.2.2.2:53-<><>-OK
|DNS-response| ifconfig.co is 188.113.88.193
|R-chain|-<>-201.172.17.231:10000-<><>-188.113.88.193:80-<><>-OK

% vim proxychains.conf # to change the setting
% proxychains stack exec -- curl "http://ifconfig.co"
ProxyChains-3.1 (http://proxychains.sf.net)
|R-chain|-<>-201.172.16.131:55599-<><>-188.113.88.193:80-<><>-OK

wget and aria2c also behave like curl.

wreq behaves like scalpel:

% proxychains stack exec -- test-wreq "http://188.113.88.193"
ProxyChains-3.1 (http://proxychains.sf.net)
|R-chain|-<>-201.172.17.231:10000-<><>-188.113.88.193:80-<--timeout

HTTP (the Haskell package) behaves differently than all the rest, failing
to connect even where the rest succeed:

% proxychains stack exec -- test-http "http://ifconfig.co"
ProxyChains-3.1 (http://proxychains.sf.net)
|DNS-request| ifconfig.co
|R-chain|-<>-201.175.94.245:38746-<><>-4.2.2.2:53-<><>-OK
|DNS-response| ifconfig.co is 188.113.88.193
|R-chain|-<>-201.175.94.245:38746-<><>-188.113.88.193:80-<--timeout

Gists of the programs I ran:
https://gist.github.com/rpglover64/f668ed372c63e271cf15

Anyone have any idea what's going on?
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.haskell.org/pipermail/haskell-cafe/attachments/20160218/74107a9b/attachment.html>


More information about the Haskell-Cafe mailing list