[Haskell-cafe] Slow IO

Tue Sep 12 12:51:11 EDT 2006

Am Montag, 11. September 2006 17:44 schrieben Sie:
> Daniel Fischer wrote:
> > >   Try Don Stewart's ByteString library
> >
> > -- and some data that made the programme segfault before now run in a
> > couple of seconds.
>
> But that shouldn't happen!  Segfaults aren't performance problems, but
> errors.  So your program contained a bug (lazyness where it isn't
> useful, I suspect), and ByString is hiding it.

Maybe I've misused the word segfault.
What happened is:
The programme consumed more and more memory (according to top),
kswapd started to have a higher CPU-percentage than my programme,
programme died, system yelling 'Speicherzugriffsfehler', top displays 
'kswapd<defunct>'.
I believe that means my programme demanded more memory than I have available 
(only 256MB RAM + 800MB swap). Is that a segfault or what is the correct 
term?

That is probably due to (apart from the stupidity of my IO-code) the large 
overhead of Haskell lists. So the chunk of the file which easily fits into my 
RAM in ByteString form is too large as a list of ordinary Strings.
However the problem might be too little lazyness, because if I explicitly read 
the file line by line, memory usage remains low enough -- but ByteString is 
*much* faster anyway.

>
> > So even if we just counted newlines, we would have to scan 1,700 million
> > (1.7*10^9) chars per second.
> > Could any ordinary computer of today do that, using whatever language?
>
> That rate should be no problem for a 2GHz machine.  However, a 200MHz 64
> bit wide bus won't deliver the data fast enough and it is 50x as much as
> the best hard disks could deliver in optimal circumstances.  I guess,
> most of the test cases are a lot smaller than your guesstimate.
>

I suppose so, too, but not knowing the test data, I assumed a bad case 
(uniform distribution of line-lengths and test-case-size in the specified 
range).

>
> Udo.

Bulat:
> are you mean arithmetic or geometric average? ;)
I meant 'expected value'.
If X_i are independent random variables uniformly distributed on [0 .. k],
Y is a random variable (independent from the X_i) uniformly distributed on
[1 .. n] and Z is the sum of the first Y of the X_i, then the expected value 
of Z is (n+1)*k/4.
So we might call that a weighted arithmetic average, I suppose.

Cheers,
Daniel

-- 

"In My Egotistical Opinion, most people's C programs should be
indented six feet downward and covered with dirt."
	-- Blair P. Houghton