Written on

Spec-YAML sucks – let's do JNGL.

My secret is that I like YAML – but only a little part of it. The rest is unnecessary, completely bonkers, or dangerous. Yes, I mean that self-proclaimed "human" data format is used in so many CI systems, automation tools, container management systems, and CMSs as front-matter.

Ok, but YAML is fine what's all the buzz about? Let me explain – or rather summarize: Ruud van Asseldonk has a lot of the pitfalls explained for you in his YAML document from hell. Like probably many others, I was bitten by these pitfalls a few times but thought it was bugs in the YAML engine.

Chris Coyier recently summarized the complexity as well and why JSON or TOML is better but isn't that great to be written by humans. That I don't like TOML that much boils down to a personal preference and laziness of mine (and also that weird dot notation for objects).

The simple parts of YAML are fine – I guess.

From my point of view, the unnecessary complexities and the breaking changes between versions are not ok. The simple parts of YAML are great, though. Because they are straightforward to write. For me, they cover my basic data format needs:

  • Basic data types like numbers, booleans, strings, lists and objects.
  • Objects and lists are nestable.
  • You have indentions for implicit, lazy formatting but can fall back to more explicit formats with {} and [].
  • There are comments.
  • There is a multiline text without a delimiter on every things line.

Of course, other people have other needs. That's probably why YAML has a ten-chapter spec with multiple versions and breaking changes in between. Instead of "cover all bases" a more simple and easier implementable "cover most bases" would be far better and more useful in my humble opinion.

Since this is my blog, let's look at what I need…

My simplified version of YAML

Everything I need is covered in the list of essential features I already described. A notable definition of the data types would be:

  • There are six data types (object, list, string, number, boolean, null).

  • A string value is marked by single or double quotes like "value" or 'value'. Inside the string value a quote character can be escaped by a backslash. All JSON string rules apply.

  • The only boolean values are true or false.

  • The empty value is null.

  • For numbers, all JSON number rules apply.

  • A list can have items line by line, prefixed by a minus character or be inside square brackets [] separated by a comma. A trailing comma at the end of a bracketed list is ignored. A list can hold all of the six data types as values.

  • An object is a set of key/value pairs. The key is always a string. The quotes around the key are not needed unless the key is not a single word or a special character. Key and value are separated by a colon. The value can be any of the six types and may start in the same line. If the value starts in the next line it must then be indented by two space characters. When the value is a multiline string, it starts with a pipe character | in the line of the key and can span multiple lines afterward indented by two spaces each. An object can also be represented by curly brackets {}. In this notation, the key/value pairs are separated by a comma and multiline strings are not possible. Indentions and trailing commas are ignored in curly bracket notation.

  • All text outside of a string starting with # is a comment.

Ok, let's look at an example:

page:
  title: 'Jon said: "That's my cool blog post!"'
  updated: "2023-06-12"
  description: |
    This is a long form text that may span multiple lines and <b>can</b> contain
    markup and stuff. It could **as well** be interpreted as markdown.
  options: ["one", "two", 6, true]
  "personal Rank": 5.00003

# For when you need a different color...
styles:
  - name: "Base theme"
    src: "/assets/site.css"
    empty: null

  - name: "My theme"
    src: "/assets/my-theme.css"
    isAlternate: true

  - {
      name: "My 2nd theme", 
      src: "/assets/my-2nd-theme.css", 
      isAlternate: true,
    }

That would evaluate to a JSON object (or list if you start with a list) as:

{
  "page": {
    "title": "Jon said: \"That's my cool blog post!\"",
    "updated": "2023-06-12",
    "description": "This is a long form text that may span multiple lines and <b>can</b> contain markup and stuff. It could **as well** be interpreted as markdown.",
    "options": [ "one", "two", 6, true ]
    "personal Rank": 5.00003
  },
  "styles": [
    {"name": "Base theme", "src": "/assets/site.css", "empty": null},
    {"name": "My theme", "src": "/assets/my-theme.css", "isAlternate": true},
    {"name": "My 2nd theme", "src": "/assets/my-2nd-theme.css", "isAlternate": true}
  ]
}

That shouldn't be hard to implement. It's essentially a stricter YAMl with all the junk removed. Because it is stricter it shouldn't run into all the boogie traps of the normal YAML but still be parsable by a YAML engine.

Let's name this spec.

I call this format jngl and you may speak that as the English jungle. Why? If find it funny and it's somewhere in between JSON and YAML.