[jhc] what to work on next?

Thu Jun 25 23:50:47 EDT 2009

On Thu, Jun 25, 2009 at 01:00:51AM -0400, Isaac Dupree wrote:
>>>  In your  release announcement I would like to hear
>>> - that there are finally enough bugs fixed that JHC is usable.
>>> - that it's as easy to use as GHC for normal coding [can you make a   
>>> Haskell-Platform-with-JHC to install?]
>> getting a lot closer, the fact that containers,applicative,filepath and
>> others compile without a hitch implies to me that it is getting quite
>> usable as far as outstanding bugs go.
>
> and it will naturally be incremental improvement like that.  I mean,  
> this is objective progress after all, it's worth listing in release  
> notes (and online docs or wherever)

Yeah, I usually post a bit more. This most recent release was motivated
by the recent iPhone interest  so that is what I concentrated on in the
release note.

Honestly, I think I am paranoid that someone will try jhc and find it
not to their liking and thus disapointed  so I tend to undersell it.
Which is pretty counter-productive since said person is likely to file a
bug report and thus spur development to fix jhc's flaws (or may even
decide to become a developer themselves). I know it's silly, but it's a
hard habit to break.

>>> - Which extensions are implemented.
>>
>> It supports quite a few extensions, but they don't always exactly line
>> up with GHC.
>
> not responding to this in particular, but, in general;
> I think a lot of the Haskell community (including me I guess) will be  
> coming from somewhat of a GHC perspective.  And most of the available  
> libraries will too (e.g. extensions required by Hackage packages are  
> sometimes excessive).  Do treasure your differences, don't just throw  
> them away wantonly, but don't underestimate the value of working  
> together!  You've made a wonderful tool that is a compiler, but just  
> think all the horrors of Unicode and threading and Windows and who knows  
> what bugs GHC and its base libraries have been dealing with in the past  
> years!  (I mean I've tried things like maintaining unicode code, but I'm  
> only one person and I have a lot of interests! It's usually just better  
> in the end to increase code sharing, therefore bug-reports and patches  
> and all those good things.).  so... I think where possible it'd be good  
> to try and share a code-base for libraries, and put effort into  
> improving that. (okay I admit it's frustrating too, since GHC momentum  
> will make it harder to start getting involved hacking.  But things are  
> IMHO less bad than the ghc-6.4 days when I entered haskell)  Anyway also  
> for the extensions that do line up JHC-GHC, at least you can try and  
> give them the same names/understanding, which it looks like you're doing.

Yeah, I try to be GHC compatible when it makes sense, but I also don't
have problems with breaking compatibility when there is a good reason.
Sometimes I am able to improve on what ghc does, other times things just
work out differently due to the natural evolution of code. But in
general, if I can be compatible without compromising design, I will be. 

Though, you bring up another interesting issue when it comes to
developing the libraries. JHC and GHC actually have a very different
philosophy here in that in jhc, I rely on the underlying operating
systems services whenever possible. For instance, I don't try to roll my
own unicode or buffered file support. I rely on 'iconv' and 'FILE' and
friends when they are available. In general, the thinking is, A whole
lot of people have worked very hard already at optimizing FILE, the OS
may even do things like use zero-copy mmaped buffers to back it. So, I
use existing resources whenever possible.

Incidentally, this makes porting to new platforms quite easy, I have a
very shallow and high level library layer to switch out, rather than
finding a way to implement a whole buffered IO system. For instance, the
buffered IO system relys on pointers to alloced memory and being able to
read and write into it. something that clearly won't work with a VM
style back end that doesn't allow raw memory access. I need only replace
putChar with the VM provided equivalent.

It also creates small and fast executables. which is always good :)

>> some are
>> CPP, UnboxedTuples, EmptyDataDecls, TypeSynonymInstances, Arbitrary
>> Rank N Types, RULES pragmas
>>
>> some more interesting ones in various stages of completion are:
>>
>> fully impredicative type system, [forall a . a] is a valid type for
>> instance.
>>
>> first class existentials (exists a . a) is a valid type as well.
>
> interesting.  GHC has had such a hard time finding a *good* scheme for  
> impredicative type inference, and there have been lots of papers written  
> (so I hear) on the subject.  Is JHC's current scheme described  
> somewhere? (including limitations. Which mostly means "which type  
> signatures you need to specify", IIRC).

It is pretty much exactly the algorithm described in this paper
http://research.microsoft.com/en-us/um/people/simonpj/papers/boxy/
I should note that sometimes the front end type system bites off more
than it can chew, in that it will correctly type something that
eventually the back end realizes it can't handle. I need to formalize
exactly what is allowed. In any case, pretty much all code in the wild
seems to use a common subset of what GHC and JHC provide when it comes
to advanced type system stuff. 

I should note that this problem bites both ways, for instance the core
has full support for GADTs, in fact they are vital for my type class
implementation. however the front end has no way to express them since I
have no parse rules for them. That could be a good project for someone
actually. But MPTCs are clearly the most pressing issue when it comes to
the type system.

Incidentally the 'boxy types' inference algorithm is also what allows
unboxed polymorphism. unboxed types can be 'pushed down' in the same way
higher rank types can, in the end, unknown types are defaulted to
Bits32_ (think Int# from ghc) if they wern't otherwise constrained.

>> unboxed values. working with unboxed values is almost as
>> straightforward and elegant as working with boxed values. 
>
> a possible pitfall of making them more usable, is it may become less  
> obvious where they're evaluated (since the point of unboxed values, was  
> to make sure not to rely on the optimizer, I think?)

My main motivation here was things like writing operating systems in
haskell, where you need to mix strict and lazy code in interesting ways.
implementing wait-free algorithms can depend critically on the exact
order of memory writes for instance. Also, it helps the "haskell is a
better C than C" idea if writing unboxed code isn't akward. I don't
intend for working with unboxed values to become a mainstream haskell
thing, mainly to make things you would normally have to drop into C for
less painful.

There is also another motivation, it tests my cores ability to handle
mixed-mode code well, which opens up interesting possibilities like a ML
from end for jhc that lets you seamlessly mix strict and lazy code. If
unboxed strict haskell meshes seamlessly with lazy haskell then an ML
front end is just a parsing issue rather than a major redesign.

>> "strict" boxed values. values of kind !. they are boxed, but guarenteed
>> evaluated. They are fully polymorphic unlike unboxed values.
>
> yay! I want this in ghc :-P

Jhc has a very rich internal type system, it is described in this
section of the manual

http://repetae.net/computer/jhc/manual.html#jhc-core-type-system

Unlike some other attempts at mixing strict and lazy evaluation, it
doesn't tread on new type theoretic principles, it is just a clever
instantiation of a PTS (pure type system) so existing known proven
properties of PTSs hold. A nice boon to know I am on steady ground
theoretically. I think this and the way I transform monadic code into
loops without giving up it functional character at any point are
probably the main academic achievments of jhc. There is a lot of other
neat stuff, but a lot is just cherry-picked from the current state of the
art and extended in obvious ways.

>> Now, an issue is that even when jhc implements the same extension as
>> ghc, it may not be exactly the same. as in, jhc's type system is
>> different than ghcs. they both as rank-n types, but since there is no
>> formal description of the ghc type system as it isn't in a language
>> standard, I am sure there are programs ghc accepts and jhc doesn't and
>> vice versa. In practice, these cases are probably rare but until
>> haskell' is formalized they will be an issue. 
>
> yeah, that'll happen slowly now and then.  Documenting individual  
> extensions is a good thing, even when not part of finishing haskell'.  I  
> suspect haskell' won't exist until a non-GHC compiler looks likely to  
> ever support it though!
>
>> We can have every package on cabal add explicit support for jhc, but
>> those dependency lists are already getting complicated just to support a
>> couple different versions of ghc. imagine 15 years down the road, with a
>> half dozen new ghc versions, 3 uhc versions, 5 lhc ones and a dozen jhc
>> releases that all need to be taken into account in every cabal file. Not
>> scalable at all.  Also, I couldn't bring myself to ask everyone to add
>> explicit jhc support, it is just putting a band-aid on the problem and I
>> don't like to do band-aids.
>
> GHC/cabal folks seem to be moving towards a more portable model.  e.g.,  
> separate "ghc-base" from "base" (which for GHC would depend on  
> "ghc-base") so that Jhc could export an equivalent base (and maybe there  
> the code divergence would be small enough that "base" itself could just  
> use ifdefs, since most code would be in "jhc-base" or portable packages).

A problem is that my 'base' still wouldn't be the same. For instance,
Data.Typeable is part of base, but jhc's Data.Typeable will never look
like GHC's Data.Typeable due to some major design changes. Types can be
examied by case statements in jhc core, meaning typerep manipulation is
subject to the same case-optimizations as normal data constructors. This
has nice properties, but it also means that handwritten Typeable
instances arn't possible. deriving Typeable is the only way to get them.
Now, this is just one example, but it means that no matter what even if
jhc merged bases with ghc's, 'base-4' would still mean different things
to the two of them, jhc will have to lie.

>
> More importantly IMHO, they *want* to be interoperable and if you try to  
> cooperate with them, they'll be more than happy to have a non-GHC  
> compiler with which to fix their bad assumptions!

I am not sure this is universally true. Oh, it is generally true, but
private correspondence has taught me there are some.. controlling..
personalities out there that want everything haskell to be subject to
a certain world-view.

But yes. you are right, in that a non-ghc compiler will hopefully shake
things up some. And perhaps my impressions above were just based on
having a bad day on a few peoples part (mine included). Certainly,
cabals problems arn't as obvious when you only have one main compiler,
but it gets harder and harder to get in the door as it were.

perhaps jhc-pkg will at least illustrate a different way to do things,
even if it doesn't directly compete with cabal..

>> All in all, cabal has a closed-world assumption which isn't true. It
>> assumes all haskell code is on hackage
>
> to some extent yes, the "hackage" centralisation seems like a bad thing,  
> but it's also a great boon!  (Any opinions on perl's CPAN and the like?)

Yeah, hackage has been useful. But I think that has to do with the
haskell community rather than design decisions made by hackage.

CPAN can be nice (and also frustrating at times) but there is a _major_
difference between CPAN and haskell. Perl is an implementation-defined
single sourced language. Interoperability isn't something the CPAN
design had to deal with. I think a lot of hackages issues arise from
attempting to take the CPAN model and applying it to haskell without
thinking about what makes haskell different. Things that work for
single-source languages don't work for standards defined ones, and
it is not surprising that attempting to squeeze haskell into that mold
has made alternate implementions of haskell more difficult. I think
following this path, either haskell will cease to exist, and we will
just have 'glasgow haskell' defined as whatever the newest ghc
implements or the system will break down and stagnate.

>> and we can enumerate all
>> compilers and language extensions in cabal itself. This makes haskell
>> development more insular, rather than standardizing new versions of the
>> language, people are dissuaded from using alternate versions by the
>> design of the tools.
>
> The overhead isn't that great, and it makes you document each language  
> extension at least a bit, so that other compilers might at least have a  
> *chance* of implementing the same thing...

I was refering more to the fact that cabal hardcodes compilers and
extensions into its source, rather than have some extensible format.
It would be super nice if all I needed to do to get jhc to work with
cabal was drop a file somewhere describing how to call jhc, or have jhc
conform to some standard command line convention rather than become a
cabal developer and modify its source...

>
>> So, the short answer is, cabal is tricky. But jhc can do something like
>> 'cabal install' but much better and that is mainly what people want out
>> of it.
>
> that's true (but also think how many convenient features cabal is  
> getting).  I suspect the path through cabal to be the reasonable long  
> term answer, as you'll then get smart people (e.g. libraries at haskell.org  
> !) asking tough questions about what their tools need to do!

I am not so sure. There seems to be a fundamental difference in what
they think a build system should do. 

One recent thing that illustrates it very well is the push to put
'maximum version numbers' on dependencies. As in, cabal is not just
prone to bitrot, it is actively encouraged to make sure your packages
become obsolete.

Now, what I want out of a build system is for it to compile my code the
best way it can. If I get called away for two years on a contract to do
C# work (it's happened before) and want to start using haskell again
because I hear of some new compiler that sounds interesting, what is
vitally important is that I can just go into  my old directories and
tell it to build and have it just work if at all possible.  I can take my
sunsite CDs printed 12 years ago and ./configure those tarballs and they
will in all likelyhood just work, on an OS that didn't even exist when
they were made with a compiler and new C standard they didn't even know
about. I consider this the _fundamental_ thing a build system should do.
Try to build my code if at all possible, for all time, if at all possible. 

Now cabals dependency model is not only akward (having to specify
dependencies without knowing what you are compiling on), but actively
inhibits the one thing build systems should do, adapt to unknown
environments. It is so concerned about not attempting to compile on
something for which there is a slight chance the compile will fail, that
it pretty much blocks out any ability for it to adapt to unknown
systems. I mean, if a compile was going to fail, then it is going to
fail anyway, so really the only thing the strictness does is cut out
platforms where it would have worked.

        John

-- 
John Meacham - ⑆repetae.net⑆john⑈ - http://notanumber.net/