[nhc-bugs] nhc98 1.14 FFI issues
Manuel M T Chakravarty
chak@cse.unsw.edu.au
Tue, 10 Sep 2002 16:29:02 +1000 (EST)
----Next_Part(Tue_Sep_10_16:29:02_2002_861)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Malcolm Wallace <Malcolm.Wallace@cs.york.ac.uk> wrote,
> Manuel M T Chakravarty <chak@cse.unsw.edu.au> writes:
>
> > * nhc98 exports its FFI libraries from a module called `FFI',
> > whereas all other systems (and the standard) use `Foreign'
> > for the language-independent and `CForeign' for the
> > C-dependent libraries.
>
> I have recently started to update nhc98's implementation of the FFI
> to match the current standard, as a precursor to adopting the new
> hierarchical libraries packages. I was planning to rename nhc98's
> internal FFI library modules to the new hierarchical variants, rather
> than the flat namespace. Maybe we can still do that, but access the
> flat namespace through -package haskell98.
Good. That's basically the same as GHC does.
> > * The FFI includes a `Bits' modules that differs from
> > nhc98's `Bit' module (different function names).
>
> Again, module Bit belongs to an older standard - Haskell 1.3.
> It should be easy enough to add Data.Bits along with the other
> hierarchical libs.
Great.
> > * I get a set of strange error messages:
> >
> > ====== Errors after type inference/checking:
> > No default for Parsers.Token at 282:3.(1378,[(173,1432)])
> > No default for Parsers.Token at 254:32.(1267,[(173,1433)])
> > No default for Parsers.Token at 223:1.(1061,[(173,1188)])
> > No default for Parsers.Token at 204:25.(1180,[(173,1187)])
>
> nhc98 has a, how shall we put it, "curious" interpretation of Haskell's
> class defaulting rules. Whilst several people believe it to be a more
> sensible design than the current defaulting rules (which include the
> DMR), it does throw up occasional surprises for the uninitiated.
>
> Essentially, nhc98 thinks there is an ambiguous type variable somewhere
> in the usage of the functions it reports. In other compilers, the
> monomorphic restriction probably cuts in to solve the ambiguity,
> but nhc98 does not implement the DMR. However, nhc98 does accept an
> explicit default for any class, not just the numeric classes. Hence,
> you can resolve the ambiguity with something like
>
> default (Integer,Double,MyType)
>
> where MyType is an instance of the class Parsers.Token.
I don't see how this applies to my code (ie, I don't think
the DMR comes in anywhere). The function at line 282 (first
of the above listed error locations) is
(*>) :: Token t => Parser a t s -> Parser a t r -> Parser a t (s, r)
p *> q = (,) $> p *$> q
It is neither a simple pattern binding nor lacking a type
signature. (In fact, I always put signatures at all
toplevel bindings, which means that the DMR doesn't affect
me.)
Besides, even if the "curious" interpretation of the
defaulting rules would apply to me, I don't think this is a
nice setup. It's fine for nhc98 to implement extensions to
H98, but they should only be activated when a special
command line option is given. Everything else just makes it
a pain to write portable code, which means that people just
will stick with a single compiler.
I'll attach my Parsers module to make it easier to see what
happens.
Cheers,
Manuel
----Next_Part(Tue_Sep_10_16:29:02_2002_861)--
Content-Type: Text/Plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="Parsers.hs"
-- Compiler Toolkit: Self-optimizing LL(1) parser combinators
--
-- Author : Manuel M. T. Chakravarty
-- Created: 27 February 99
--
-- Version $Revision: 1.22 $ from $Date: 2001/02/09 02:36:10 $
--
-- Copyright (c) [1999..2000] Manuel M. T. Chakravarty
--
-- This library is free software; you can redistribute it and/or
-- modify it under the terms of the GNU Library General Public
-- License as published by the Free Software Foundation; either
-- version 2 of the License, or (at your option) any later version.
--
-- This library is distributed in the hope that it will be useful,
-- but WITHOUT ANY WARRANTY; without even the implied warranty of
-- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
-- Library General Public License for more details.
--
--- DESCRIPTION ---------------------------------------------------------------
--
-- This module implements fully deterministic, self-optimizing LL(1) parser
-- combinators, which generate parse tables on-the-fly and are based on a
-- technique introduced by Swierstra & Duponcheel. The applied technique for
-- efficiently computing the parse tables makes essential use of the
-- memorization build into lazy evaluation.
--
-- The present implementation is rather different from S. D. Swierstra and
-- L. Duponcheel, ``Deterministic, Error-Correcting Combinator Parsers'', in
-- John Launchbury, Erik Meijer, and Tim Sheard (Eds.) "Advanced Functional
-- Programming", Springer-Verlag, Lecture Notes in Computer Science 1129,
-- 184-207, 1996. It is much closer to a a revised version published by
-- S. D. Swierstra, but handles actions completely different. In particular,
-- Swierstra's version does not have a threaded state and meta actions. The
-- present module also defines a number of additional combinators and uses
-- finite maps to optimise the construction of the transition relation stored
-- in the node of the transition graph (this also saves substantial memory).
--
--- DOCU ----------------------------------------------------------------------
--
-- language: Haskell 98 & rank-2 polymorphism (existentially quantified type
-- variables)
--
-- Unlike conventional parser combinators, the combinators do not produce
-- parsers, but only specifications of parsers that can then be executed
-- using the function `parse'.
--
-- It is basically impossible to get this efficient without universally-
-- quantified data type fields (or existentially quantified type variables)
-- as soon as we encode the parsers in a data structure. The reason is that
-- we cannot store the action functions in the structure without that
-- feature.
--
-- A user-defined state can be passed down in the parser and be threaded
-- through the individual actions.
--
-- Tokens:
--
-- * Tokens must contain a position and equality as well as an ordering
-- relation must be defined for them. The equality determines whether
-- tokens "match" during parsing, ie, whether they are equal modulo their
-- attributes (the position is, of course, an attribute). The ordering
-- supports an optimised representation of the transition graph. Tokens
-- are, furthermore, printable (instance of `Show'); the resulting string
-- should correspond to the lexeme of the token and not the data
-- constructor used to represent it internally.
--
-- * I tried using arrays to represent the transition relation in the nodes
-- of the graph, but this leads to an enormous memory consumption (at least
-- with ghc 4.05). One reason for this is certainly that these arrays are
-- relatively sparsely populated.
--
--- TODO ----------------------------------------------------------------------
--
-- * Error correction is still missing.
--
module Parsers (Token, Parser, empty, token, skip, (<|>), (*$>), (*>), ($>),
action, meta, opt, (-*>), (*->), many, list, many1, list1,
sep, seplist, sep1, seplist1, execParser)
where
import List (sort)
import Common (Position, Pos (posOf), nopos)
import FiniteMaps (FiniteMap, unitFM, joinCombFM, mapFM, lookupFM, toListFM)
import Errors (interr, ErrorLvl(..), Error, makeError)
infix 5 `opt`
infixl 4 *>, -*>, *->, *$>, $>
infixl 3 `action`
infixl 2 <|>
-- data structures
-- ---------------
-- token class (EXPORTED)
--
class (Pos t, Show t, Eq t, Ord t) => Token t
-- tree structure used to represent parsers specifications (EXPORTED
-- ABSTRACTLY)
--
-- * each node corresponds to a state of the represented automaton and is
-- composed out of an action and a parsing continuation, which encodes the
-- state transition function in the current state
--
data Token t =>
Parser a t r = forall q. Parser (Action a t q r) -- action functions
(Cont a t q) -- parsing continuation
-- parsing continuation
--
data Token t =>
Cont a t r = -- maybe end of input
--
Empty r -- return if no input
(Parser a t r) -- used if there is input
--
-- selection of acceptable tokens paired with following
-- parser state
--
| Alts (FiniteMap t (Parser a t r))
--
-- represents an automaton without any transitions
-- (semantically equivalent to `Alts zeroFM', but easier to
-- match)
--
| Done
-- actions
--
-- * Note that the rank-2 polymorphism (existentially quantified type
-- variable) is essential here to seperate the action function from the
-- parser (if we don't do that, the actions are pushed down in the parser
-- structure until they reach the `Empty' variant matching the end-of-file
-- in the actual parse - this makes the parser structure as deep as the
-- input has tokens!)
--
-- * the meta action is transforming a threaded state top-down; the result of
-- the state transformer (type `q'') is passed to the tree constructor
-- action, after the following parser has been applied; the meta action has
-- to be executed before the parser is applied, as the parser get's the
-- internal state *after* transformed by the meta action; overall we have,
-- (1) meta action, (2) recursive application of the parser, (3) tree
-- constructing action
--
data Token t =>
Action a t q r = forall q'. Action (a -> (a, q')) -- meta action
(q' -> t -> q -> r) -- tree constr
-- internal tree constructors
--
nometa :: Token t => (t -> q -> r) -> Action a t q r
nometa simpleAction = Action (\s -> (s, ())) (\_ -> simpleAction)
singleton :: Token t => t -> Parser a t r -> Cont a t r
singleton t p = Alts $ unitFM t p
noaction :: Token t => Cont a t q -> Parser a t q
noaction = Parser $ nometa (flip const)
tokaction :: Token t => Cont a t q -> Parser a t t
tokaction = Parser $ nometa const
noparser :: Token t => Parser a t r
noparser = noaction Done
-- basic combinators
-- -----------------
-- Without consuming any input, yield the given result value (EXPORTED)
--
empty :: Token t => r -> Parser a t r
empty x = noaction $ Empty x noparser
-- Consume a token that is equal to the given one; the consumed token is
-- returned as the result (EXPORTED)
--
token :: Token t => t -> Parser a t t
token t = tokaction $ singleton t (empty ())
-- Consume a token that is equal to the given one; the consumed token is
-- thrown away (EXPORTED)
--
skip :: Token t => t -> Parser a t ()
skip t = noaction $ singleton t (empty ())
-- Alternative parsers (EXPORTED)
--
(<|>) :: Token t => Parser a t r -> Parser a t r -> Parser a t r
--
-- * Alternatives require to merge the alternative sets of the two parsers.
-- The most interesting case is where both sets contain cases for the same
-- token. In this case, we left factor over this token. This requires some
-- care with the actions, because we have to be able to decide which of
-- the two actions to apply. To do so, the two parsers prefix their results
-- with a `Left' or `Right' tag, which makes it easy to decided in the new
-- combined action, which of the two subparsers did match.
--
(Parser _ Done) <|> q = q
p <|> (Parser _ Done) = p
--(Parser a (Empty _ p)) <|> (Parser a' (Empty _ q)) = grammarErr p q
(Parser a (Empty x p)) <|> q = mergeEpsilon a x p q
p <|> (Parser a' (Empty x q)) = mergeEpsilon a' x q p
(Parser a (Alts alts1)) <|> (Parser a' (Alts alts2)) =
Parser (a `joinActions` a') $ Alts (joinCombFM (<|>) alts1' alts2')
where
alts1' = mapFM (\_ p -> Left $> p) alts1
alts2' = mapFM (\_ p -> Right $> p) alts2
grammarErr :: Token t => Parser a t r -> Parser a t r -> b
grammarErr p q = interr $ "Parsers.<|>: Ambiguous grammar!\n\
\ first (left parser): " ++ first p ++ "\n\
\ first (right parser): " ++ first q ++ "\n"
mergeEpsilon :: Token t
=> Action a t q r -> q -> Parser a t q -> Parser a t r
-> Parser a t r
mergeEpsilon a x p q =
let anew = a `joinActions` nometa (flip const) -- mustn't touch token!
newcon = Empty (Left x) (Left $> p <|> Right $> q)
in
Parser anew newcon
joinActions :: Token t
=> Action a t q r -> Action a t q' r
-> Action a t (Either q q') r
(Action m con) `joinActions` (Action m' con') =
Action (joinMeta m m')
(\(q'1, q'2) t qalt -> case qalt of
Left q -> con q'1 t q
Right q -> con' q'2 t q)
-- combine two meta action into one, which yields a pair of the individual
-- results (the state is threaded through one after another - no assumption
-- may be made about the order)
--
joinMeta :: (a -> (a, r1)) -> (a -> (a, r2)) -> a -> (a, (r1, r2))
joinMeta meta meta' = \s -> let
(s' , q'1) = meta s
(s'', q'2) = meta' s'
in
(s'', (q'1, q'2))
-- Sequential parsers, where the result of the first is applied to the result
-- of the second (EXPORTED)
--
(*$>) :: Token t => Parser a t (s -> r) -> Parser a t s -> Parser a t r
-- !!!
(Parser a@(Action m con) Done) *$> q =
let con' = interr "Parsers.(*$>): Touched action after an error!"
in
Parser (Action m con') Done
(Parser a@(Action m con) (Empty f p)) *$> q =
-- _scc_ "*$>:Empty"
let a' = Action m (\q' t q -> con q' t f q)
in
contract a p *$> q <|> contract a' q
(Parser (Action m con) (Alts alts)) *$> q =
-- _scc_ "*$>:Alt"
let con' x' t (xp, xq) = con x' t xp xq
in
Parser (Action m con') (Alts $ mapFM (\_ p -> p *> q) alts)
contract :: Token t => Action a t q r -> Parser a t q -> Parser a t r
contract (Action m con) (Parser (Action m' con') c) =
let a' = Action (joinMeta m m')
(\(x'1, x'2) t x -> con x'1 notok (con' x'2 t x))
in
Parser a' c
where
notok = interr $ "Parsers.(*$>): Touched forbidden token!"
-- Sequential parsers, where the overall result is the pair of the component
-- results (EXPORTED)
--
(*>) :: Token t => Parser a t s -> Parser a t r -> Parser a t (s, r)
p *> q = (,) $> p *$> q
-- apply a function to the result yielded by a parser (EXPORTED)
--
($>) :: Token t => (s -> r) -> Parser a t s -> Parser a t r
f $> Parser (Action m con) c = let con' q' t q = f $ con q' t q
in
Parser (Action m con') c
-- produces a parser that encapsulates a meta action manipulating the
-- threaded state (EXPORTED)
--
meta :: Token t => (a -> (a, r)) -> Parser a t r
meta g = Parser (Action g (\q' _ _ -> q')) (Empty () noparser)
-- non-basic combinators
-- ---------------------
-- postfix action (EXPORTED)
--
action :: Token t => Parser a t s -> (s -> r) -> Parser a t r
action = flip ($>)
-- optional parse (EXPORTED)
--
opt :: Token t => Parser a t r -> r -> Parser a t r
p `opt` r = p <|> empty r
-- sequential composition, where the result of the rhs is discarded (EXPORTED)
--
(*->) :: Token t => Parser a t r -> Parser a t s -> Parser a t r
p *-> q = const $> p *$> q
-- sequential composition, where the result of the lhs is discarded (EXPORTED)
--
(-*>) :: Token t => Parser a t s -> Parser a t r -> Parser a t r
p -*> q = flip const $> p *$> q
-- accept a sequence of productions from a nonterminal (EXPORTED)
--
-- * Uses a graphical structure to require only constant space, but this
-- behaviour is destroyed if the replicated parser is a `skip c'.
--
many :: Token t => (r -> s -> s) -> s -> Parser a t r -> Parser a t s
--
-- * we need to build a cycle, to avoid building the parser structure over and
-- over again
--
many f e p = let me = (f $> p *$> me) `opt` e
in me
-- return the results of a sequence of productions from a nonterminal in a
-- list (EXPORTED)
--
list :: Token t => Parser a t r -> Parser a t [r]
list = many (:) []
-- accept a sequence consisting of at least one production from a nonterminal
-- (EXPORTED)
--
many1 :: Token t => (r -> r -> r) -> Parser a t r -> Parser a t r
--many1 f p = p <|> (f <$> p <*> many1 f p)
many1 f p = let me = p <|> (f $> p *$> me)
in me
-- accept a sequence consisting of at least one production from a nonterminal
-- and return a list of results (EXPORTED)
--
list1 :: Token t => Parser a t r -> Parser a t [r]
list1 p = let me = (\x -> [x]) $> p
<|> ((:) $> p *$> me)
in me
-- accept a sequence of productions from a nonterminal, which are seperated by
-- productions of another nonterminal (EXPORTED)
--
sep :: Token t
=> (r -> u -> s -> s)
-> (r -> s)
-> s
-> Parser a t u
-> Parser a t r
-> Parser a t s
sep f g e sepp p = let me = g $> p <|> (f $> p *$> sepp *$> me)
in me `opt` e
-- return the results of a sequence of productions from a nonterminal, which
-- are seperated by productions of another nonterminal, in a list (EXPORTED)
--
seplist :: Token t => Parser a t s -> Parser a t r -> Parser a t [r]
seplist = sep (\h _ l -> h:l) (\x -> [x]) []
-- accept a sequence of productions from a nonterminal, which are seperated by
-- productions of another nonterminal (EXPORTED)
--
sep1 :: Token t
=> (r -> s -> r -> r) -> Parser a t s -> Parser a t r -> Parser a t r
sep1 f sepp p = let me = p <|> (f $> p *$> sepp *$> me)
in me
-- accept a sequence consisting of at least one production from a nonterminal,
-- which are separated by the productions of another nonterminal; the list of
-- results is returned (EXPORTED)
--
seplist1 :: Token t => Parser a t s -> Parser a t r -> Parser a t [r]
seplist1 sepp p = p *> list (sepp -*> p) `action` uncurry (:)
{- Is the above also space save? Should be. Contributed by Roman.
seplist1 sepp p = let me = (\x -> [x]) $> p
<|> ((:) $> p *-> sepp *$> me)
in me
-}
-- execution of a parser
-- ---------------------
-- apply a parser to a token sequence (EXPORTED)
--
-- * Trailing tokens are returned in the third component of the result (the
-- longest match is found).
--
-- * Currently, all errors are fatal; thus, the result (first component of the
-- returned pair) is undefined in case of an error (this changes when error
-- correction is added).
--
execParser :: Token t => Parser a t r -> a -> [t] -> (r, [Error], [t])
--
-- * Regarding the case cascade in the second equation, note that laziness is
-- not our friend here. The root of the parse tree will be constructed at
-- the very end of parsing; so, there is no way, we can have any pipelining
-- with following stages here (and then there are the error messages, which
-- also spoil pipelining).
--
execParser (Parser (Action m con) c) a [] = -- eof
case c of
Empty x _ -> (con (snd . m $ a) errtoken x, [], [])
_ -> (errresult, [makeError FatalErr nopos eofErr], [])
execParser (Parser (Action m con) c) a ts = -- eat one token
case m a of -- execute meta action
(a', x') -> case cont c a' ts of -- process next input token
-- !!! (t, (x, errs, ts')) -> ((((con $! x') $ t) $!x), errs, ts')
(t, (x, errs, ts')) -> ((((con $ x') $ t) $ x), errs, ts')
where
cont :: Token t => Cont a t r -> a -> [t] -> (t, (r, [Error], [t]))
cont Done _ (t:_) = makeErr (posOf t) trailErr
cont (Alts alts) a (t:ts) = case lookupFM alts t of
Nothing -> makeErr (posOf t) (illErr t)
Just p -> (t, execParser p a ts)
cont (Empty x p) a ts =
case p of
Parser _ Done -> (errtoken, (x, [], ts))
_ -> (errtoken, (execParser p a ts))
makeErr pos err = (errtoken, (errresult, [makeError FatalErr pos err], []))
eofErr = ["Unexpected end of input!",
"The code at the end of the file seems truncated."]
trailErr = ["Trailing garbage!",
"There seem to be characters behind the valid end of input."]
illErr t = ["Syntax error!",
"The symbol `" ++ show t ++ "' does not fit here."]
errresult = interr "Parsers.errresult: Touched undefined result!"
errtoken = interr "Parsers.errtoken: Touched undefined token!"
-- for debugging
-- -------------
-- first set of the given parser (prefixed by a `*' if this is an epsilon
-- parser)
--
first :: Token t => Parser a t r -> String
first (Parser _ (Empty _ p)) = "*" ++ first p
first (Parser _ (Alts alts)) = show
. sort
. map show
. map fst
. toListFM
$ alts
instance Token t => Show (Parser a t r) where
showsPrec _ (Parser a c) = shows c
instance Token t => Show (Cont a t r) where
showsPrec _ (Empty r p ) = showString "*" . shows p
showsPrec _ (Alts alts) = shows alts
----Next_Part(Tue_Sep_10_16:29:02_2002_861)----