ideas for compiler project

Eray Ozkural (exa) erayo@cs.bilkent.edu.tr
Fri, 1 Feb 2002 16:39:30 +0200


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Friday 01 February 2002 14:18, Jerzy Karczmarczuk wrote:
>
> My 2 euro-cents:
> Compiler? yes, why not?
> Interpreter? You mean, a virtual machine able to do FAST all those array
> manipulations?

An interpreter is just something that compiles on the fly :) It doesn't have 
to bear a radically different design from its compiler countpart. A design 
could choose some sort of intermediate language as output, and some other 
could generate native code.

> But you will get into the same problem as with a Haskell library...
> The "kernel" with fast matrix multiplication, with really damn fast
> submatrix extraction, with blitzing convolutions, Fourierisms etc. need
> a rather low-level access tools to the memory.
>

That's correct, and I think there are enough numerical codes around to form 
that kernel. My feeling is that the kernel is rather easy. It is the algebra 
and the linguistic interface to this functionality that would be problematic. 
What I have in mind would be a language as fast as FORTRAN, and more capable 
than matlab.

> The same story holds for bitmap processing.
> Look at Smalltalk. Its compiler and a *good part* of the virtual machine
> is written in Smalltalk. But when you have to snap an image from the
> screen, to copy it back, to move it - no use, the PixBlt primitives are
> written in C.
>

Of course, that's architecture dependent ways of performing expensive 
operations.

> So, I presume that a decent strategy would be the following, with points
> A and B below developed synchronously.
>
> A. Design a kernel which is stupid as the Napoleon's hat (concerning the
>    algebra), but which performs fast indexing and block transfer in many
>    typical examples: row extraction from a matrix, all simple iterators
>    (sums, pair-wise products, etc.) - you know what I mean. Such patterns
>    are not very numerous.
>    Make all this primitive.
>
> B. Design a sound, functional, typed layer for matrix algebra, but using
>    those blocks, slices, rows, sub-matrices, Kronecker products etc. as
>    primitives.
>
> Test both together with all the algorithms possible, and when something,
> I don't know, some Householder algorithm, some Lanczos whatever turns out
> to be too slow, analyze critically the performance, and augment the
> kernel layer.
>

You're right. This would be the right path to follow. Point B is harder to 
obtain though, as it also determines what Point A will be. For A you can 
start with the low level data structures and algorithms, and then try to 
extend it. For B, the critical point is the optimizer, which depends on the 
language design... The compiler will have to be smart enough to avoid things 
a primitive translation would entail, like redundant duplication of entire 
matrices. I'd written a parallel array library in C++ that tried to avoid 
that by using 'expression templates'. With a true compiler it's so much 
easier to pull those tricks.

There is a lot of research on compilation of parallel matrix code:
 
http://citeseer.nj.nec.com/133614.html
http://citeseer.nj.nec.com/bau94solving.html
http://citeseer.nj.nec.com/cierniak94unifying.html
http://citeseer.nj.nec.com/13405.html
http://citeseer.nj.nec.com/coelho96state.html

The ideas in these papers would be relevant although they emphasize data/code 
 distribution... 

> ======
>
> Last thing. It is easy to criticize Matlab saying that its replacement
> might be better. Often such statements come from people who don't use it
> actually.
>

Well I used it, but it seems to be awfully inefficient for some tasks. 
Nevertheless, it's a great tool overall. :)

[snip]
(I agree with what you said)

> As a programming language it is worse than Fortran (save for vectorized
> arithmetic). So, linguistically a functional scientific programming tool
> would be really very nice. But the performance is another issue.
>

Well I think it is like that because of the way it started. They surely won't 
rewrite it from scratch ;) It was intended as a mathematics scripting 
language, and they got that right.

Regards,

- -- 
Eray Ozkural (exa) <erayo@cs.bilkent.edu.tr>
Comp. Sci. Dept., Bilkent University, Ankara
www: http://www.cs.bilkent.edu.tr/~erayo
GPG public key fingerprint: 360C 852F 88B0 A745 F31B  EA0F 7C07 AE16 874D 539C
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.6 (GNU/Linux)
Comment: For info see http://www.gnupg.org

iD8DBQE8WqikfAeuFodNU5wRAm+EAKCcgLFPNfkqTTKeRX+wLlLEOw9WRwCdE/+A
ThJPOCPP9zFasHrjpSszua4=
=1P2r
-----END PGP SIGNATURE-----