[Haskell-cafe] Re: String vs ByteString

Ivan Lazar Miljenovic ivan.miljenovic at gmail.com
Tue Aug 17 22:19:50 EDT 2010


On 18 August 2010 12:12, wren ng thornton <wren at freegeek.org> wrote:
> Johan Tibell wrote:
>>
>> To my knowledge the data we have about prevalence of encoding on the web
>> is
>> accurate. We crawl all pages we can get our hands on, by starting at some
>> set of seeds and then following all the links. You cannot be sure that
>> you've reached all web sites as there might be cliques in the web graph
>> but
>> we try our best to get them all. You're unlikely to get a better estimate
>> anywhere else. I doubt few organizations have the machinery required to
>> crawl most of the web.
>
> There was a study recently on this. They found that there are four main
> parts of the Internet:
>
> * a densely connected core, where from any site you can get to any other
> * an "in cone", from which you can reach the core (but not other in-cone
> members, since then you'd both be in the core)
> * an "out cone", which can be reached from the core (but which cannot reach
> each other)
> * and, unconnected islands

I'm guessing here that you're referring to what I've heard called the
"hidden web": databases, etc. that require sign-ins, etc. (as stuff
that isn't in the core, to differing degrees: some of these databases
are indexed by google but you can't actually read them without an
account, etc.) ?

-- 
Ivan Lazar Miljenovic
Ivan.Miljenovic at gmail.com
IvanMiljenovic.wordpress.com


More information about the Haskell-Cafe mailing list