[Haskell-i18n] Some starters for the new list

Sven Moritz Hallberg pesco@gmx.de
15 Aug 2002 20:21:24 +0200

Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

Aren't you mixing two different problems? I see these:=20

  1) Choose a string in a language of the user's preference.=20
  2) (De)serialize characters according to some codec.=20

I still feel that, "inside the Haskell universe", a Char should be just
that: a character. This, to me, implies that it should be able to hold
every possible character "value" - i.e. Char should reperesent a Unicode
character (code point - is that the correct term?).=20

Now we must tackle the two problems above, I'll in this mail concentrate
on no. 1:=20

What's wrong about this:=20

        data Lang =3D En | Fr | De | ...=20
        data Msg =3D Hello | NoSuchFile | ...=20

        trans En Hello =3D "Hello, World!"=20
        trans Fr Hello =3D "Bonjour le monde!"=20
        trans De Hello =3D "Hallo Welt!"=20

        trans En _ =3D error "Gah, provide at least english!"=20
        trans _ m =3D trans En m=20

        main =3D l <- systemLang   -- returns a Lang
               putStr (trans l Hello)=20

Or, if you want string lookup to avoid the extra data type(s):=20

        trans "en" =3D id=20
        trans "fr" "Hello, World!"=20
                 =3D "Bonjour, \231a va?"=20
        trans "de" "Hello, World!" =3D "Moin moin!"=20

        trans _ msg =3D msg=20

        main =3D l <- systemLang  -- would return String in this case=20
               putStr (trans l Hello)=20

Of course you have to pass the language parameter to all non-IO=20
functions, but would you want it otherwise?

After some pondering, I think we should base i18n on the first snippet's

  - There will be a data type representing a "message" which will be=20
    displayed to the user. The message is then translated to a string
    in a given language.

  - The languages are, this appears natural to me, values of a data type
    again. We can have ctors for everything there are ISO codes for or
    so. In order to be extensible, a ctor taking a string argument
    appears suitable.

I suspect we can build all the convenience we need on top of this easy
and clear to understand basis. For instance, if we want the string
lookup way of things, one can just use String as the message type, like

    type Msg =3D String


The choice is up to the application developer. I'd personally tend to
use a real data type, because that lets me do this:

        data Msg =3D ... | MessagesWaiting n | ...

        trans En MessagesWaiting n
                 =3D "You have "++(show n)++m++" waiting."
                 where m=20
                       | n=3D=3D1 =3D " message"
                       | otherwise =3D " messages"
        trans De MessagesWaiting n
             | n=3D=3D1 =3D "Sie haben eine neue Nachricht."
                 =3D "Es warten "++(show n)++m++" auf Sie."
                 where m
                       | n=3D=3D1 =3D " N

I can't think of a more contrived example right now, but I'm sure many
exist. Also note that (once there is some sort of locale-aware show) one
will get the benefits of that coherently across all messages.

On Thu, 2002-08-15 at 14:11, Alastair Reid wrote:=20
> > I just want to repeat something somebody suggested, and which I
> > thought was a really neat idea: Have string constants in programs be
> > replaced by (Prelude.fromString "..") or similar, like numerical
> > constants are handled already.
> > This was suggested in order to simplify the use of PackedString, but
> > I think it might come in handy for translation issues, too.
> I find it a little hard to picture this so let's fill in some details
> so that we can agree that we're talking about the same thing and also
> to make the idea more concrete.
> Using typeclasses in this way would require us to make the encoding
> explicit in the typesystem.  So we'd define a bunch of types
> corresponding to characters and to strings:
>   data Char   =3D .. -- unicode
>   data Latin1 =3D ... -- Latin1
>   ...
> and we'd define two classes and the basic operations on them.
>   class Enum a =3D> Charset a where fromChar   :: Char   -> a
>   class Ord a  =3D> String  a where fromString :: String -> a
> Why did I define two classes instead of just one?  The more obvious
> design was to have
>   class Enum a =3D> Charset a where
>     fromChar   :: Char   -> a
>     fromString :: String -> [a]
> but this wouldn't let us make PackedString an instance of it.  This
> could be fixed using multiparameter type classes but splitting the
> class is easier.  (We might revisit this decision if we want
> operations to convert Charsets to Strings and the like.)
> Details:
> - We might want to add operations to convert back to Unicode - though
>   that might require additional parameters to fill in details not
>   encoded in the type?
> - What should we do if the conversion fails?  For example, if I try to
>   convert the unicode yin-yang character (\u262f) to Latin1?
> - We probably want additional operations for strings like map, append,=20
>   etc.
> - fromString should be applied to strings used in patterns.
> - This requires a minor change in the report which states that a string
>   literal is just an abbreviation for a list of characters.
> Overall, this looks like it might be a viable approach.  The only
> potential showstoppers seem to be what to do when conversion fails.
> > (Naturally, the idea is that Prelude.fromString can be repaced by a
> > function that looks the string up in a translation table, instead of
> > using the default value.  Any reason this won't work?)
> This goes quite a bit further than what I suggest above but let's try
> to sketch it out.
> 1) You have to define a new string type:
>    newtype FrenchString =3D FS String
> 2) You have to define an instance:
>    instance String FrenchString where
>      fromString (FS "General Protection Fault") =3D "..."
>      fromString (FS "File not found") =3D "..."
>      ...
>      fromString (FS _)                =3D ????
> Well, it seems simple enough.  Once again though, we have the problem
> of what to do when the conversion fails.  What happens in the real
> world?  Do they print the string in English and hope for the best?
> I don't feel entirely comfortable with doing things this way.  I think
> I'df prefer to see an explicit call to a translation function like
> 'toFrench'.  I presume that the advantage of this approach would be
> that you could use existing libraries without change?  Unfortunately,
> the way I've sketched it out, the code has to be modified to use the
> type 'FrenchString' instead of 'String' so we don't achieve this goal.
> Overall, this doesn't look like it will work.
> --
> Alastair Reid                 alastair@reid-consulting-uk.ltd.uk =20
> Reid Consulting (UK) Limited  http://www.reid-consulting-uk.ltd.uk/alasta=
> _______________________________________________
> Haskell-i18n mailing list
> Haskell-i18n@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-i18n

Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

Version: GnuPG v1.0.7 (GNU/Linux)