[Haskell-cafe] Re: String vs ByteString

Johan Tibell johan.tibell at gmail.com
Tue Aug 17 08:53:09 EDT 2010


On Tue, Aug 17, 2010 at 2:23 PM, Yitzchak Gale <gale at sefer.org> wrote:

> Michael Snoyman wrote:
> > Regarding the data: you haven't actually quoted any
> > statistics about the prevalence of CJK data
>
> True, I haven't seen any - except for Google, which
> I don't believe is accurate. I would like to see some
> good unbiased data.
>

To my knowledge the data we have about prevalence of encoding on the web is
accurate. We crawl all pages we can get our hands on, by starting at some
set of seeds and then following all the links. You cannot be sure that
you've reached all web sites as there might be cliques in the web graph but
we try our best to get them all. You're unlikely to get a better estimate
anywhere else. I doubt few organizations have the machinery required to
crawl most of the web.

-- Johan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.haskell.org/pipermail/haskell-cafe/attachments/20100817/14a92e84/attachment.html


More information about the Haskell-Cafe mailing list