[Haskell-cafe] binary IO
joelr1 at gmail.com
Wed Dec 28 05:17:08 EST 2005
On Dec 27, 2005, at 10:30 PM, Tomasz Zielonka wrote:
> Let's see if I understand correctly. There are 17605 messages in
> trace.dat. On my hardware the average message unpicking time is
> when you only have a single thread. So, it indeed seems that with 1000
> threads it should be possible to process every message in under 4
I'm on a PowerBook G4 1.25Ghz with 1Gb of memory and lots of apps
running. I have about 60-70Mb of physical memory free although other
apps are barely consuming CPU time.
This is what I get from the Erlang version. Columns are # of
processes, average time per message, peak observed physical memory
and peak observed virtual memory usage respectively.
1 - 2.34322e-5s,
10 - 3.91232e-5s, 18Mb, 50Mb
100 - 3.26753s, 70Mb, 100Mb, all 100 failed since alarms were set at
I just noticed that I'm unpickling all the packets whereas timeleak
only looks at compressed commands and unpickles server info packets
from those. I also made ~160Mb of physical memory available and
decided to read some web articles while the tests are running
(browser already running). Another run...
1 - 1.00657e-6s
10 - 1.10232e-6s
100 - 3.09583s, 55Mb, 90Mb, all 100 failed
1000 - 25s. All failed rather quickly.
The issue could be that they are all stumbling at the same big
packets at about the same time. So I inserted a random delay of
between 100ms and 3.1s and got an average of 2.96161e-2s with 77
failures out of 100. On 1000 it's 957 failed with slightly more than
3s and 1.12748e-6 on the rest.
The comparison is still a bit unfair as Haskell compiles to native
code whereas I was running the above test using the regular bytecode
VM. With native compilation enabled I get
1 - 1.00359e-6s
10 - 1.08691e-6s
100 - 6.19101e-3s with 87 out of 100 failed at about 3.5s.
100 - 1.12210e-6s and 0 failed on another run.
The difference is in the random delays between bot starts,
apparently. You are well off so long as bots don't hit compressed
packets all at once. The big packets decompress into 50k and are a
hassle to unpickle.
Now here's the kicker... Using the read_ahead option when opening the
file gives you a ~64K buffer. Another run...
10 - 1.06194e-6
100 - 1.05641e-6
1000 - 1.06799e-6 and 916 failed with time between 3s and 4s
Increasing alarm time to 4s, using the native compiler with all
optimizations (erlc +native +o3 +inline *erl) gives me
10 - 1.10848e-6s
100 - 1.24159e-6s, 0 failed
1000 - 1.02611e-6s, 923 failed
> Right now I can think of two reasons:
> - 1000 treads need much data in the help, which increases the cost
> of GC and with frequent context switches results in poor cache
> - the GHC's process scheduler is not as intelligent as Erlang's
It's clear to me by now that the app is not language or compiler-bound.
> One possible solution is to reduce the number of simultaneously
> unpicklings/ picklings (I think someone already proposed it, was that
> Bulat?). It would reduce the amount of needed memory and should
> cache usage.
> But then, what will be the delay observed by the server?
Right, you can't really do this as it will increase the overall delay.
>> Each bot is given 5, 15 or 35 seconds to respond by the poker server
>> and this is proving to be too little for my Haskell implementation.
> This is per message, right?
Per message, yes. I will code the networking part of my Erlang version
and will report whether I got more/less timeouts than tha Haskell
> What if the spec was the data type itself? When I was dealing with a
> proprietary Intel-based binary format, I derived my picklers /
> unpicklers with TH from data type definitions. Of course, there were
> cases were the derived code would be wrong, and then I had to write
> code myself, but it was for about 2-3 record types out of total 30.
Wish I could do that. I don't know TH at all and any TH I got was due
to the folks on #haskell and here providing it (thanks Cale!).
More information about the Haskell-Cafe