[Haskell-cafe] announce: Glome.hs-0.3 (Haskell raytracer)

Jim Snow jsnow at cs.pdx.edu
Fri Apr 18 17:09:28 EDT 2008

David Roundy wrote:
> On Sat, Apr 19, 2008 at 12:19:19AM +0400, Bulat Ziganshin wrote:
>> Saturday, April 19, 2008, 12:10:23 AM, you wrote:
>>> The other problem I had with concurrency is that I was getting about a
>>> 50% speedup instead of the 99% or so that I'd expect on two cores.  I 
>> 2 cores doesn't guarantee 2x speedup. some programs are limited by
>> memory access speed and you still have just one memory :)
> In fact, this is relatively easily tested (albeit crudely):  just run two
> copies of your single-threaded program at the same time.  If they take
> longer than when run one at a time, you can guess that you're
> memory-limited, and you won't get such good performance from threading your
> code.  But this is only a crude hint, since memory performance is strongly
> dependent on cache behavior, and running one threaded job may either do
> better or worse than two single-threaded jobs.  If you've got two separate CPUs
> with two separate caches, the simultaneous single-threaded jobs should beat the
> threaded job (meaning take less than twice as long), since each job should
> have full access to one cache.  If you've got two cores sharing a single
> cache, the behavior may be the opposite:  the threaded job uses less total
> memory than the two single-threaded jobs, so more of the data may stay in
> cache.
> For reference, on a friend's dual quad-core Intel system (i.e. 8 cores
> total), if he runs 8 simultaneous (identical) memory-intensive job he only
> gets about five times the throughput of a job, meaning that each core is
> running at something like 60% of it's CPU capacity due to memory
> contention.  It's possible that your system is comparably limited, although
> I'd be suprised, somehow it seems unlikely that your ray tracer is
> stressing the cache all that much.
On a particular scene with one instance of the single-threaded renderer
running, it takes about 19 seconds to render an image.  With two
instances running, they each take about 23 seconds.  This is on an
Athlon-64 3800+ dual core, with 512kB of L2 cache per core.  So, it
seems my memory really is slowing things down noticeably.


More information about the Haskell-Cafe mailing list