Hacker News new | past | comments | ask | show | jobs | submit login

"I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability." -- Crockford

This is horrific design reasoning. It's an authoritarian, presumptuous, "punish everyone in the classroom because one child misbehaves" mentality.

Comments would be useful in JSON because comments are useful in code, and JSON is code. For example, I might have a config file that I'm typing in that I want to leave a documentation trail for.

Don't tell me I can do a silly thing like redefine a field, as if it's "neat". It's an abomination that I have to resort to such things. And guess what: by resorting to such things I can still do precisely what Crockford claims he was trying to prevent. So his rationale is not only insulting to one's intelligence, it's sheer stupidity.




> It's an authoritarian ...

Which is pretty much what a specification is.

It's one or more people saying "This is how things are if you call them X".

> presumptuous

Presumptuous? It was in response to the feature being abused!

> "punish everyone in the classroom because one child misbehaves" mentality

No more than creating laws is. A significant subset of the population are misusing it in such a way as could cause widespread damage. It is a minor inconvenience to the 'law abiding people' (particularly given than any comments would be removed if read in and spat out by any program). There are workarounds ("field_comment":"some comment") or if that's not enough, use another format. Use one that allows comments, there are many.

> Don't tell me I can do a silly thing like redefine a field, as if it's "neat". It's an abomination that I have to resort to such things

It's also completely unreliable, it's a terrible solution and nobody should use it. I think we're fully in agreement here.

> And guess what: by resorting to such things I can still do precisely what Crockford claims he was trying to prevent. So his rationale is not only insulting to one's intelligence, it's sheer stupidity.

No you can't. The point was to stop people adding pre-processing commands or other such things to json, which would be in random formats and invisible to some parsers (as comments should be), visible and important to others. You don't want to pass a valid piece of JSON through a parser and end up with two different outcomes dependent on something in a comment, do you? Or have to use parser X or Z because Y doesn't understand directive A, but it does understand directive B and C, and while Z understands C, and X knows B, Z doesn't, so I have to use the version from a pull request from DrPotato which I think supports...

What I'm saying is that there is a benefit in simple standards.


I'm curious how the notion of XML processing instructions informs your opinion. In general I think having a standard is somewhat more important than the precise details in the standard, but XML PIs enable precisely the kind of thing Crockford feared, yet it doesn't seem to have materialized. Is this because processing instructions are not inherently harmful or because segregating them from comments disarms them?


XML PIs have a spec, don't they? (actual question) From some googling the W3C site has this :

> PIs are not part of the document's character data, but must be passed through to the application

If they're being passed through and not being used by the parser, it's no different really than a

    "directive" : "blah"
in JSON, which is fine. The application at the end needs to deal with it, but the parser doesn't, and that's really important. If it's just a comment, passing the file into and out of a program could remove the comment.

    something.json | python -mjson.tool | myjsonprocessingapp
Should be the same as

    something.json | myjsonprocessingapp
If the parser does need to understand the directive, at least there's a difference between an error of "I don't understand directive X" and no error at all because your parser ignored the comments.


> and JSON is code

JSON is data. It appears to be JS code, but JSON is data. Data is not code ( http://www.c2.com/cgi-bin/wiki?DataAndCodeAreNotTheSameThing ). That's why the idea of data holding parsing directives is silly. If you want to do that, then embed that in the data (hold a MsgType key in the data records). There's no need for comments unless you are trying to use it for something other than raw data.


> There's no need for comments unless you are trying to use it for something other than raw data.

Is this a true statement? Even books have margins, and word docs comments. I think it’s not infrequent that pure data calls for metadata to put it into context for future users of that data.

And in computing most "pure data" formats have had either comments - or schemas and specifications which outline which the contents. The later sure look like comments stored externally to the documents, from my perspective.

In general I do not think data is self describing, and thus must be commented on in some form to describe it.


You can represent annotations (which describe most of your examples) by adding keys:

    {
        "data": "some data",
        "data_comments": "here are my comments"
    }


Not transparent to actual clients of the data.

edit for clarity: You're assuming that the application code isn't doing something with each key that it reflectively sees in the object, e.g. creating database fields to match them, or launching missiles towards those destinations, etc.. If you wouldn't automatically add dummy elements to a hashmap or dictionary in Java or Python, then you shouldn't add keys in a javascript object, unless you control the source to the program that will processing the data. Even then you shouldn't, because it will become a habit to add comments this way, and that will bite you when an extra key does matter.


or just use the key "comment" more than once, which is sort of a hybrid of the ideas.


Parsers might throw an error on duplicate keys, or launch emacs solving the towers of hanoi.


"Data is not code"

Lisp programmers disagree.


Lisp programmers think "Code is data", not "Data is code"


Lisp programmers write code so that data is code.


Lisp: "All code is data"

That's not the question. "All data is code" is not the same statement.

In a different context: "All apples are fruit" may be true but that doesn't imply "all fruit are apples"


Lisp programmers think "data is code and code is data, both are the same thing and they're interchangeable". As it happens to actually be. I point to GEB [0] for more detailed discussion of this, but let me give you a few examples which point out that the distinction between code and data is mostly meaningless.

- Ant build descriptions that look suprisingly like executable Lisp code if you replace "<tag> ... </tag>" with "(tag ...)".

- Musical notation which is obviously code for humans playing instruments (it even has loops, I think, AFAIR from my music lessons; don't know about conditionals; if it has them, maybe it's Turing-complete? (ETA it would seem it is[3])).

- Windows Metafile format for bitmap and vector graphics which is basically a serialized list of WinAPI calls [1].

- "fa;sldjfsaldf" - the "not code, just data" example from [2] that happens to be "a Teco program that creates a new buffer, copies the old buffer and the time of day into it, searches and then selectively deletes". Oh, and it's also "a brainfuck program that does nothing, and a vi program to jump to the second "a" forwards and replace it with the string "ldjfsaldf"".

[0] - https://en.wikipedia.org/wiki/G%C3%B6del,_Escher,_Bach

[1] - http://en.wikipedia.org/wiki/Windows_Metafile

[2] - http://www.c2.com/cgi-bin/wiki?DataAndCodeAreNotTheSameThing

[3] - http://programmers.stackexchange.com/questions/136085/is-mus...


> Musical notation which is obviously code for humans playing instruments (it even has loops, I think, AFAIR from my music lessons; don't know about conditionals; if it has them, maybe it's Turing-complete?).

The are conditionals in standard music notation, at least ones that involve "executing different code" based on the value of a loop counter.


Don't Lisp much do you?

Code is data and vice versa. Look up what the acronym JSON means sometime.


Code is data but data isn't necessarily code. Even in Lisp.


The difference is one of interpretation, not of representation; i.e. it's determined by an application, above parser level. When looking just at the written down form, data and code are the same thing.

Code more Lisp and read more Hofstadter ;).


> Data is not code

All code is data, but not all data is code.


Nonsense. This is just more arrogance.

JSON is code because I use it as code. It's not your business to tell me it's not code -- you haven't seen how I'm using it. And don't go chirping that I should only do things your way, it's none of your god damned business what I'm using it for.

Further, if JSON was really only data, then it's an incredibly stupid way to store data, given that it has a human-readable syntax that the computer can only deal with after it's been parsed. As data, it's bloated and inefficient. To the extent that JSON is a good format, it's code. To the extent that it's data, it's not a good format.


A fork can be a spoon for you, if you choose to use it that way. Nobody is telling you what you are supposed to use it for, but still JSON was designed as data format.

If you don't like the format or feel that JSON is too restrictive/bad feel free to extend it or create your own format from scratch.


> still JSON was designed as data format

While I don't think that comments belong in JSON, I don't agree JSON is designed as "data and not code" format. Trees of tokens are actually the natural format for writing code (also known as Abstract Syntax Trees, AST) and the data/code distinction is really, really blury when those two meet together, so it's only to be expected that people will end up coding in JSON (what are the 'build definition' files for various build tools / package managers, if not very simple programs)?


You can use a screwdriver as a hammer all you want, it's not going to make it a good idea. This isn't a free speech issue.

> Further, if JSON was really only data, then it's an incredibly stupid way to store data, given that it has a human-readable syntax that the computer can only deal with after it's been parsed. As data, it's bloated and inefficient.

So use something else. Also, a computer can only read any file after it's been parsed in some way. I'm not really sure what you're suggesting as an alternative.

> To the extent that JSON is a good format, it's code

Is it executable? Is it turing complete?


> Is it executable? Is it turing complete?

It represents groups of more-less arbitrary tokens as trees, therefore it's a natural format for code representation as it's equivalent to an AST, therefore it's trivial to attach a basic execution context with if and lambda defined, and now it's executable and turing-complete.


So any indented text file would be considered code?


> JSON is code because I use it as code

You could use JSON as code, but that's somewhat silly, because there's already a superset of JSON designed for that use.


Technically not true: http://timelessrepo.com/json-isnt-a-javascript-subset

        {"JSON":"ro
cks!"}
(there's a unicode line separator -- 2028)


> JSON is code because I use it as code

You can't use JSON to compute things, therefore it is not code (unless you are willing to concede that any document format is code).


Maybe a more useful resolution to this would be to state that while all code is data, no data should be code?

You could, if you were crazy enough, write perfectly valid JSON that passed the values to eval() or a parser or what have you. And while there are encodings in JSON that don't work in javascript (i've broken JS innumerable times trying to get that to work) JS does of course allow you to add closures as an object, or an array, whatever you like, and some forms of valid JSON (if not all) are also valid javascript. So you could indeed use JSON to compute things if you wanted to.


> unless you are willing to concede that any document format is code

Because it is. Data vs. code distinction is arbitrary. The following sequence of characters:

"echo 'foobar';"

can be interpreted as describing a string, a series of tokens, a piece of code, a piece of music or a small icon, whatever interpretation you choose.


Yes, I understand that "code is data". This does not mean that data, in general, is code; unless you are willing to make the words completely meaningless. "Code" requires some notion of an execution platform/environment, which does not exist for arbitrary data. Here is a string: "the quick brown fox jumps over the lazy dog". Or how about "\u0000\u0000". That is not code, as generally understood.


> "Code" requires some notion of an execution platform/environment, which does not exist for arbitrary data.

Arbitrary data don't exist without some notion of an execution (or interpretation) platform.

We tend to use "code" as a word for "commands telling some execution process what to do" and "data" as a word for "information that is meant to be transformed" but in reality this distinction is meaningless; both are fundamentally the same thing, and even our "code" vs. "data" words have blurry borders. It's very apparent when you start reading configuration files. For example, aren't Ant "configuration files" essentially programs[0]?

We all know what we usually mean in context by saying what is "code" vs. what is "data", but one has to remember, that in fact they are the same - minding it leads to insights like metaprogramming. Forgetting about it leads to dumb languages and nasty problems, and is generally not wise.

[0] - the answer is: yes, they are, see http://www.defmacro.org/ramblings/lisp.html for more.

ETA:

Questions to ponder:

- are regular expressions code, or data?

- is source written in Prolog code, or data?

Also I recommend watching http://www.youtube.com/watch?v=3kEfedtQVOY to learn how what would be data, as defined by formal grammars of some real-world protocols, can - by means of sloppy grammars and bad parser implementation - cross the threshold of Turing-completeness and become code.


I understand all this. Like many people, I've written programs in C++ templates. But I think we're talking past each other because you want to make a pedantic point. I'm using the words as they are generally understood, not in a technical computer science way. I'm talking about first-level stuff, not metaprogramming. Let me give you some questions to ponder:

- Is the text of Hamlet code?

- Was it code as soon as Shakespeare wrote it?

- If not, did it become code once the electronic computer was invented? Or did that happen once a version was stored in a way accessible to an electronic computer?

- Did all the existing paper copies immediately become code at that point as well?


> But I think we're talking past each other because you want to make a pedantic point.

I guess that's true.

The flavour of "code vs. data" discussion in this thread was one of representation formats. You could argue that when looking at works of art from past centuries one should immediately say "data!" [0]. But in case of JSON, a format suspiciously almost identical to Lisp in structure, one needs to be careful in saying "it's for data, not for code".

Actually, I'm not sure what kind of point I'm trying to make, as the more I think of it, the more examples of things that are borderline code/data come to my mind. Cooking recipes is the obvious candidate, but think about e.g. music notation - it clearly feels more like "code" than "data".

I feel that you could define a kind of difference between "code" and "data" other than in intent, something that could put bitmaps into the "data" category, and a typical function into "code" category, but I can't really articulate it. Maybe there's some mathematical way to describe it, but it's definitely a blurry criterion. But when we're discussing technology, I think it's harmful to pretend that there's a real difference. Between configuration files looking like half-baked Lisp listings and "declarative style" C++ that looks like datasets with superfluous curly braces, I think it's wrong to even try to draw a line.

[0] - there's a caveat though. "How to Read a Book" by Mortimer Adler[1] discussess briefly how the task of a poet is to carefully chose words that evoke particular emotional reactions in readers. It very much sounds like scripting the emotional side of the human brain.

[1] - http://en.wikipedia.org/wiki/How_to_Read_a_Book


> Is the text of Hamlet code?

> Was it code as soon as Shakespeare wrote it?

Yes, the text of a play is code meant to be executed by humans.


That is pretty funny, but not what I was going for :)


So is all opinionated design "stupid"?

I do not presume to know who you are, or what you have accomplished, but there are few people with the professional and academic background that qualify to be able to call Douglas Crockford "stupid".


>So is all opinionated design "stupid"?

He never said that.

>I do not presume to know who you are, or what you have accomplished, but there are few people with the professional and academic background that qualify to be able to call Douglas Crockford "stupid".

Why, who do you think Douglas Crockford is and what is his "academic background"? He doesn't even have a related degree. Most of his JS fame he ows to his book.


Since I too lack the lofty requisite background for it, I'll just let Mr. Crockford do the job the for me:

> The reason to use semicolons is because coding rigor tends to produce significantly better software.


Don't put words in my mouth, I didn't claim he was "stupid." To say that one thing he said somewhere is stupidity is a far cry from claiming he is stupid.

I also never said that "opinionated design is stupid".

Perhaps you could rephrase your question in such a way that you aren't presuming to speak for me.


JSON isn't a configuration language, it's just another data encoding format with the added benefit of being readable by humans. That and its ubiquity make it an appealing choice for stuff like ad-hoc configuration at first glance, but it's not the best choice. If you want a config language for shared human and machine consumption, use one designed for that purpose. JSON is pretty much just an encoding that is easy for humans to inspect and debug.


This. I've worked with a number of systems that "use json as the configuration language"; and in every case it's led to issues.

Given a choice it's better to have a .ini style format like the one that pythons ConfigParser will digest. That way you can have sections, comments and you won't be tempted to have the application write things into the configuration on it's own...




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: