Some initial results with DPH
Roman Leshchinskiy
rl at cse.unsw.edu.au
Tue Sep 23 00:59:50 EDT 2008
Hi Austin,
first of all, thanks a lot for taking the time to report your results!
On 23/09/2008, at 11:48, Austin Seipp wrote:
> * The vectorise pass boosts compilation times *a lot*. I don't think
> this is exactly unwarrented since it seems like a pretty complicated
> transformation, but while making the primitive version using just
> the unlifted interface the compilation takes about 1.5 seconds, for
> the vectorised version it's on the order of 15 seconds. For
> something as trivial as this dot-product thing, that's a bit
> of a compilation time, though.
The problem here is not the vectoriser but rather the subsequent
optimisations. The vectoriser itself is (or should be - I haven't
really timed it, to be honest) quite fast. It generates very complex
code, however, which GHC takes a lot of time to optimise. We'll
improve the output of the vectoriser eventually, but not before it is
complete. For the moment, there is no solution for this, I'm afraid.
> * It's pretty much impossible to use ghc-core to examine the output
> core of the vectorised version - I let it run and before anything
> started showing up in `less` it was already using on the order of
> 100mb of memory. If I just add -ddump-simpl to the command line, the
> reason is obvious: the core generated is absolutely huge.
Yes. Again, this is something we'll try to improve eventually.
> * For the benchmark included, the vectorised ver. spends about 98% of
> its time from what I can see in the GC before it dies from stack
> overflow. I haven't tried something like +RTS -A1G -RTS yet, though.
IIUC, the code is
> dotp :: [:Int:] -> [:Int:] -> Int
> dotp v w = I.sumP [: (I.*) x y | x <- v, y <- w :]
The way the vectoriser works at the moment, it will repeat the array w
(lengthP v) times, i.e., create an array of length (lengthP v *
lengthP w). This is quite unfortunate and needs to be fused away but
isn't at the moment. The only advice I can give is to stay away from
array comprehensions for now. They work but are extremely slow. This
definition should work fine:
dotp v w = I.sumP (zipWithP (I.*) v w)
> * The vectoriser is really, really touchy. For example, the below code
> sample works (from DotPVect.hs):
>
>> import Data.Array.Parallel.Prelude.Int as I
>>
>> dotp :: [:Int:] -> [:Int:] -> Int
>> dotp v w = I.sumP [: (I.*) x y | x <- v, y <- w :]
>
> This however, does not work:
>
>> dotp :: [:Int:] -> [:Int:] -> Int
>> dotp v w = I.sumP [: (Prelude.*) x y | x <- v, y <- w :]
This is because the vectorised code needs to call the vectorised
version of (*). Internally, the vectoriser has a hardwired mapping
from top-level functions to their vectorised versions. That is, it
knows that it should replace calls to
(Data.Array.Parallel.Prelude.Int.*) by calls to
Data.Array.Parallel.Prelude.Base.Int.plusV. There is no vectorised
version of (Prelude.*), however, and there won't be one until we can
vectorise the Prelude. In fact, the vectoriser doesn't even support
classes at the moment. So the rule of thumb is: unless it's in
Data.Array.Parallel.Prelude or you wrote and vectorised it yourself,
it will choke the vectoriser.
> I also ran into a few other errs relating to the vectoriser dying - if
> I can find some I'll reply to this with some results.
Please do! And please keep using DPH and reporting your results, that
is really useful to us!
FWIW, we'll include some DPH documentation in 6.10 but it still has to
be written...
Roman
More information about the Glasgow-haskell-users
mailing list