[Haskell-cafe] Standard package file format

Mon Sep 19 00:12:04 UTC 2016

On 16/09/16 6:37 PM, Tobias Dammers wrote:
> Another factor in favor of YAML is that it is a superset of JSON,

Here is a simple string in JSON:

"Where's the Golden Fleece?"

Here is the same string in YAML:

--- Where's the Golden Fleece?
...

Superset?  I understand "language X is a superset of language Y"
to mean that if I have a document in language Y it can be correctly
processed by a language X processor.

If you mean that any data value that can be represented in JSON
can be represented (differently!) in YAML, fine, but that's not
the same thing.  There are many textual formats that generalise
JSON.  Heck, even GNUSTEP Property List format does *that*.
(And no, I do not recommend adopting that for anything.)

For that matter, any JSON document can be transcoded with no
loss of structural information into XML and vice versa.  That
doesn't mean that JSON is a superset of XML!

Familiarity with JSON semantics and syntax did not help me AT ALL
when faced with YAML.

Here's another meta-format worthy of consideration.
A *package* is a collection of resources with relationships
between them and relationships linking them to other things
like authors (think Dublin Core).
Is there a standard (genuinely standard) notation specifically
for describing resources and their relationships, with quite a
few tools for not just reading it and writing it but actually
reasoning with it?

Why yes.  It's called RDF.
http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/
   The design of RDF is intended to meet the following goals:

   * having a simple data model
   * having formal semantics and provable inference
   * using an extensible URI-based vocabulary
   * using an XML-based syntax
   * supporting use of XML schema datatypes
   * allowing anyone to make statements about any resource

There is a human-friendly syntax interconvertible with the XML
one, Turtle.
http://www.w3.org/TR/turtle/

Now RDF (whether XML or Turtle) is *not* designed for presenting
single data values.  But that's not really what a package format
wants to do anyway.

Am I seriously recommending RDF (or possibly OWL-DL) as a good
way to describe packages?  I am certainly serious that it should
be CONSIDERED.  And I'm particularly serious about that for two
reasons.

(1) JSON, XML, TOML, and YAML are all about serialising *data values*.
     That's all they do.  Anything beyond that is up to you.
     RDF and OWL are all about describing *relationships* between
     *resources*.  It's worth considering carefully what you want to
     say in a package file format.  If you want to describe
     *relationships*, then something that deals with data values may
     not be the right *kind* of "language".

     Simply jarring people loose from the idea that a "single possibly
     structured data value" language is the ONLY kind of language is
     of value in itself.

(2) JSON, XML, TOML, and YAML are all about serialising *data values*.
     *Single* possibly structured data values.
     That's all they do.  There is no sense in which there is any
     standard way to *combine* data in these forms.
     In contrast, RDF was *invented* to have a way of patching together
     multiple sets of facts from multiple sources.  Given a collection
     of package descriptions in YAML, all you have is a bunch of text
     files; what you do with them is *entirely* up to you.  Given a
     bunch of RDF/XML or RDF/Turtle files, there is a *standard* way
     to write a query (SPARQL) which integrates them.  It becomes
     possible to write consistency-checking queries that can be processed
     by multiple tools.  It becomes possible to ask "if I need these,
     what else do I need?" in a standard way.

     Again, the idea here is to get people thinking that having a
     documented semantics that can be processed by existing description
     logic tools has value, so that something at a higher semantic level
     than YAML or XML might be worth thinking about.