[jhc] what to work on next?

Mon Jun 29 15:57:56 EDT 2009

On Fri, Jun 26, 2009 at 01:01:37PM -0400, Isaac Dupree wrote:
> Interesting, interesting, interesting!  Your points make more sense to
> me now -- i'll comment on a couple
>
> John Meacham wrote:
>> Though, you bring up another interesting issue when it comes to
>> developing the libraries. JHC and GHC actually have a very different
>> philosophy here in that in jhc, I rely on the underlying operating
>> systems services whenever possible. For instance, I don't try to roll my
>> own unicode or buffered file support. I rely on 'iconv' and 'FILE' and
>> friends when they are available. In general, the thinking is, A whole
>> lot of people have worked very hard already at optimizing FILE, the OS
>> may even do things like use zero-copy mmaped buffers to back it. So, I
>> use existing resources whenever possible.
>
> ah, re-use outside the haskell world!  A worthy idea, encourages the
> existence of good C libraries etc.  I guess the two ways I can think of
> weaknesses there is,
> - often C libraries don't think of features that would be obvious in
> Haskell, like duplicating/backtracking with state, which make a Haskell
> version have to be unnecessarily imperative.  (But the C libraries
> should then be improved where possible! Sometimes they have an awful lot
> of momentum though, like iconv)
> - the Haskell compiler can't optimize as much, when lots of parts of the
> code are in C :-)

Yes indeed. of course, if the OS C library implements something like
IO-Lite[1] under the scenes, then trying to reimplement plain old
buffers assuming it is as good as what the OS does is clearly
counterproductive. I guess my philosophy is more about reducing the
barrier to code reuse between languages and to not second guess what the
OS provides without good reason. Also, it's the "lazy" thing to do. :)

[1] http://www.usenix.org/events/osdi99/full_papers/pai/pai.pdf

There was a computer science professor at caltech who taught all the
standard stuff. regular expressions, finite automata, computability
theory, algorithms, etc. But each week after he gave the homework out he
would say something as an afterthought like... "oh... and this week...
do the homework in algol-68" or some other random language. A different
one each week. The idea being that for a good computer scientist,
learning a language shouldn't even register as a challenge, and more
deeply, that the concepts being taught truely were intrinsic to
computing itself and not any particular manifestation of it. I forget
where I was going with this, other than to say that I think it is a good
lesson. When thinking of cache coherency issues my mind natuarlly thinks
in C, whereas I may go through a few iterations of a datalog program in
my head for performing an optimization analysis before actually writing
it in haskell. So reusing code from outside haskell (when it makes
sense), and lowering the barrier of entry to doing so is something that
is very natural to me.

> still, it's an idea.  Like when someone found out that replacing GHC's
> complicated (-threaded) thread system with basically just 1-thread =
> 1-OS-thread, on Linux it was just as efficient as the original [Windows
> and maybe OSX were slower though].

This isn't too surprising, I was in the OS kernel group at sun
microsystems when a coworker took a look at the pathologically
generalized many-to-many standard threaded library and decided to do a
little experiment and wrote a simple wrapper that mapped exactly one
pthread to one OS thread. turned out, it was pretty much superior in all
practical cases in addition to being much smaller and with a scheduling
model that didn't give you headaches. It ended up being shipped with
Solaris 8 I believe.

This is interesting though, I always figured I would implement
concurrency with jhc via a simple wrapper around pthreads with little to
no other magic. It is good to know that ghc has already treaded that
ground and found it stable, though it doesn't surprise me. The
many-to-many thread model was flawed from day-one IMHO (no matter what
the incarnation).

> I wonder if someday someone'll take
> on a project of minimizing GHC RTS code (GHC devs like the idea in the
> abstract but are in no great hurry to make it happen)...
> garbage-collector seemed to be the biggest feature that has to stay in
> the RTS (unless I wonder if JHC could detect automatically whether a
> program would benefit from having a garbage collector?)

That is the idea actually. I don't plan on turning off my static
analysises ever, even if a garbage collector is enabled, it may well be
that some programs end up not needing it. None of my GC ideas are that
heavyweight, however the one I am playing with now requires libJudy at
run-time, which probably won't do in general but is a useful start.

>
>> Now cabals dependency model is not only akward (having to specify
>> dependencies without knowing what you are compiling on), but actively
>> inhibits the one thing build systems should do, adapt to unknown
>> environments. It is so concerned about not attempting to compile on
>> something for which there is a slight chance the compile will fail, that
>> it pretty much blocks out any ability for it to adapt to unknown
>> systems. I mean, if a compile was going to fail, then it is going to
>> fail anyway, so really the only thing the strictness does is cut out
>> platforms where it would have worked.
>
> Well, actually the strictness also allows/helps Cabal to choose the
> package-versions where the compile *will* succeed.  But.. API breakages
> are usually truly in the interface, rather than silently changing the
> semantics.  JHC could probably automatically search all the APIs for
> compatibility (though I'm not sure what kind of indexing you'd need to
> make that efficient for thousands+ of known package-versions).

My idea for jhc-pkg was to do something similar to yum. have whatever
builds the jhc-pkg site create a sqlite database that can be downloaded
and contains an index. certainly a mapping from module names to packages
won't be that strenuous, even for many thousands of entries. yum handles
on the order of 10,000 entries just fine for instance.

But the idea isn't to completely get rid of dependencies, but rather
make them smarter and adaptive and context independent. for instance,
jhc's build depends are basically "a bunch of packages always + syb if
it exists" This is quite simple and will probably work for quite a few
versions of ghc and other compilers. furthermore, it can be interpreted
unambiguously. cabal flags like 'split-base' or 'foo > 4.0' don't have
meaning on their own. You can't know what 'split-base' means without
knowing the history of ghc's libraries. whereas 'syb if it exists' is a
truely intrinsic property that can be determined locally and for any
compiler. Now, it may not be the _right_ rule for every compiler, but
not every compiler is going to support backwards compatability with some
odd version of ghc for all time either to compile the specific old
(perhaps buggy?) versions of packages needed to satisfy some strict
build-dependencies. You can add new intrinsic rules as needed and they
have a nicely synergistic effect. two rules like 'syb if it exists,
containers if it exists' catch the cases where none, either, or both
exist. whereas 'syb if ghc =  6.10' only fixes a single case for a
single iteration of a single compiler. So, you end up with sub-linear
growth of build rules in one case, but linear or worse for the current
cabal way and guarenteed bitrot to boot.

> If you get to being able to compile even a moderate fraction of Hackage,
> then you have a lot of versions and APIs you can play with to see how
> your approach might numerically compare to Cabal's :-) (e.g. success
> rate and asymptotic speed)

Yeah. this goes into writing a new packaging system fully though. Which
is something I think haskell needs, but I am not sure I have time for. I
am hoping 'franchise' evolves into something. Perhaps it can be used as
a base for cabal2. Cabal is sort of happered by its roots as a library
instead of a program, the desire to keep API compatibility means they
can't really refactor or clean up the code, even though very few
programs use most of the API in more than trivial ways. A full reboot
without the API baggage and a reasonable dependency model is needed.

Perhaps jhc-pkg will inspire something new.. perhaps not. I do believe
the current cabal will have to evolve into something like I describe, I
just don't see how it can scale otherwise. Not to mention I went through
this whole process before with 'xmkmf' finally being dropped in favor of
'autoconf' for X11 development, it had a lot of interesting parallels.

But if such a system evolves we will end up with legacy systems built on
top of each other with an accidentally cobbled together package
description language (accidental turing complete languages end up with
some odd properties) and an overly complicated codebase. Whereas if we
designed it the right way from the start we would have had something
much cleaner and well thought out.

        John

--
John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/