[Haskell-cafe] Slow IO
Daniel Fischer
daniel.is.fischer at web.de
Tue Sep 12 12:51:11 EDT 2006
Am Montag, 11. September 2006 17:44 schrieben Sie:
> Daniel Fischer wrote:
> > > Try Don Stewart's ByteString library
> >
> > -- and some data that made the programme segfault before now run in a
> > couple of seconds.
>
> But that shouldn't happen! Segfaults aren't performance problems, but
> errors. So your program contained a bug (lazyness where it isn't
> useful, I suspect), and ByString is hiding it.
Maybe I've misused the word segfault.
What happened is:
The programme consumed more and more memory (according to top),
kswapd started to have a higher CPU-percentage than my programme,
programme died, system yelling 'Speicherzugriffsfehler', top displays
'kswapd<defunct>'.
I believe that means my programme demanded more memory than I have available
(only 256MB RAM + 800MB swap). Is that a segfault or what is the correct
term?
That is probably due to (apart from the stupidity of my IO-code) the large
overhead of Haskell lists. So the chunk of the file which easily fits into my
RAM in ByteString form is too large as a list of ordinary Strings.
However the problem might be too little lazyness, because if I explicitly read
the file line by line, memory usage remains low enough -- but ByteString is
*much* faster anyway.
>
> > So even if we just counted newlines, we would have to scan 1,700 million
> > (1.7*10^9) chars per second.
> > Could any ordinary computer of today do that, using whatever language?
>
> That rate should be no problem for a 2GHz machine. However, a 200MHz 64
> bit wide bus won't deliver the data fast enough and it is 50x as much as
> the best hard disks could deliver in optimal circumstances. I guess,
> most of the test cases are a lot smaller than your guesstimate.
>
I suppose so, too, but not knowing the test data, I assumed a bad case
(uniform distribution of line-lengths and test-case-size in the specified
range).
>
> Udo.
Bulat:
> are you mean arithmetic or geometric average? ;)
I meant 'expected value'.
If X_i are independent random variables uniformly distributed on [0 .. k],
Y is a random variable (independent from the X_i) uniformly distributed on
[1 .. n] and Z is the sum of the first Y of the X_i, then the expected value
of Z is (n+1)*k/4.
So we might call that a weighted arithmetic average, I suppose.
Cheers,
Daniel
--
"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
-- Blair P. Houghton
More information about the Haskell-Cafe
mailing list