There is a interview with the inventor of JSON somewhere. In that interview he explained why he did not allow comments in JSON like in XML. He said - if I remember correctly - that it was intentional to not have comments in JSON. The reason way that comments could be misused to add additional information for a parser. For example in XML you could use comments and a special parser could use these comments to create code while parsing. He did not want that. He wanted every JSON parser to be a JSON parser and nothing more. If you wanted to have comments in JSON he said that you could simply make the comments inline and have a convention for the keys which are comments for example every key ending with _comment could have a value which is then seen as a comment by the application but not by the parser.
You are correct - confirmed in this video: Lessons of JSON
'A recent (and short) IEEE Computing Conversations interview with Douglas Crockford about the development of JavaScript Object Notation (JSON) offers some profound, and sometimes counter-intuitive, insights into standards development on the Web.'
He both invented and discovered it. Yes, the object literal syntax existed, but he also carefully (and IMHO correctly) specified a strict subset as well, for these interoperability reasons. For instance, Javascript is happy with {a: 1}, but that is not legal JSON. It's a very well done standard.
Indeed, and I apologize for my ambiguity, as you are correct. By "strict subset" what I meant was a subset that attempts to reduce options, so that legality and illegality is easier to discern. That is, where Javascript accepts apostrophe and double-quote to delimit strings, JSON only accepts double-quotes, thus, "stricter" than real Javascript.
You are of course correct that JSON turns out not to quite be a strict subset in the set theory sense of "strict subset", though obviously that's a bug in the spec rather than a deliberate design decision.
"I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability." -- Crockford
This is horrific design reasoning. It's an authoritarian, presumptuous, "punish everyone in the classroom because one child misbehaves" mentality.
Comments would be useful in JSON because comments are useful in code, and JSON is code. For example, I might have a config file that I'm typing in that I want to leave a documentation trail for.
Don't tell me I can do a silly thing like redefine a field, as if it's "neat". It's an abomination that I have to resort to such things. And guess what: by resorting to such things I can still do precisely what Crockford claims he was trying to prevent. So his rationale is not only insulting to one's intelligence, it's sheer stupidity.
It's one or more people saying "This is how things are if you call them X".
> presumptuous
Presumptuous? It was in response to the feature being abused!
> "punish everyone in the classroom because one child misbehaves" mentality
No more than creating laws is. A significant subset of the population are misusing it in such a way as could cause widespread damage. It is a minor inconvenience to the 'law abiding people' (particularly given than any comments would be removed if read in and spat out by any program). There are workarounds ("field_comment":"some comment") or if that's not enough, use another format. Use one that allows comments, there are many.
> Don't tell me I can do a silly thing like redefine a field, as if it's "neat". It's an abomination that I have to resort to such things
It's also completely unreliable, it's a terrible solution and nobody should use it. I think we're fully in agreement here.
> And guess what: by resorting to such things I can still do precisely what Crockford claims he was trying to prevent. So his rationale is not only insulting to one's intelligence, it's sheer stupidity.
No you can't. The point was to stop people adding pre-processing commands or other such things to json, which would be in random formats and invisible to some parsers (as comments should be), visible and important to others. You don't want to pass a valid piece of JSON through a parser and end up with two different outcomes dependent on something in a comment, do you? Or have to use parser X or Z because Y doesn't understand directive A, but it does understand directive B and C, and while Z understands C, and X knows B, Z doesn't, so I have to use the version from a pull request from DrPotato which I think supports...
What I'm saying is that there is a benefit in simple standards.
I'm curious how the notion of XML processing instructions informs your opinion. In general I think having a standard is somewhat more important than the precise details in the standard, but XML PIs enable precisely the kind of thing Crockford feared, yet it doesn't seem to have materialized. Is this because processing instructions are not inherently harmful or because segregating them from comments disarms them?
XML PIs have a spec, don't they? (actual question) From some googling the W3C site has this :
> PIs are not part of the document's character data, but must be passed through to the application
If they're being passed through and not being used by the parser, it's no different really than a
"directive" : "blah"
in JSON, which is fine. The application at the end needs to deal with it, but the parser doesn't, and that's really important. If it's just a comment, passing the file into and out of a program could remove the comment.
If the parser does need to understand the directive, at least there's a difference between an error of "I don't understand directive X" and no error at all because your parser ignored the comments.
JSON is data. It appears to be JS code, but JSON is data. Data is not code ( http://www.c2.com/cgi-bin/wiki?DataAndCodeAreNotTheSameThing ). That's why the idea of data holding parsing directives is silly. If you want to do that, then embed that in the data (hold a MsgType key in the data records). There's no need for comments unless you are trying to use it for something other than raw data.
> There's no need for comments unless you are trying to use it for something other than raw data.
Is this a true statement? Even books have margins, and word docs comments. I think it’s not infrequent that pure data calls for metadata to put it into context for future users of that data.
And in computing most "pure data" formats have had either comments - or schemas and specifications which outline which the contents. The later sure look like comments stored externally to the documents, from my perspective.
In general I do not think data is self describing, and thus must be commented on in some form to describe it.
edit for clarity: You're assuming that the application code isn't doing something with each key that it reflectively sees in the object, e.g. creating database fields to match them, or launching missiles towards those destinations, etc.. If you wouldn't automatically add dummy elements to a hashmap or dictionary in Java or Python, then you shouldn't add keys in a javascript object, unless you control the source to the program that will processing the data. Even then you shouldn't, because it will become a habit to add comments this way, and that will bite you when an extra key does matter.
Lisp programmers think "data is code and code is data, both are the same thing and they're interchangeable". As it happens to actually be. I point to GEB [0] for more detailed discussion of this, but let me give you a few examples which point out that the distinction between code and data is mostly meaningless.
- Ant build descriptions that look suprisingly like executable Lisp code if you replace "<tag> ... </tag>" with "(tag ...)".
- Musical notation which is obviously code for humans playing instruments (it even has loops, I think, AFAIR from my music lessons; don't know about conditionals; if it has them, maybe it's Turing-complete? (ETA it would seem it is[3])).
- Windows Metafile format for bitmap and vector graphics which is basically a serialized list of WinAPI calls [1].
- "fa;sldjfsaldf" - the "not code, just data" example from [2] that happens to be "a Teco program that creates a new buffer, copies the old buffer and the time of day into it, searches and then selectively deletes". Oh, and it's also "a brainfuck program that does nothing, and a vi program to jump to the second "a" forwards and replace it with the string "ldjfsaldf"".
> Musical notation which is obviously code for humans playing instruments (it even has loops, I think, AFAIR from my music lessons; don't know about conditionals; if it has them, maybe it's Turing-complete?).
The are conditionals in standard music notation, at least ones that involve "executing different code" based on the value of a loop counter.
The difference is one of interpretation, not of representation; i.e. it's determined by an application, above parser level. When looking just at the written down form, data and code are the same thing.
JSON is code because I use it as code. It's not your business to tell me it's not code -- you haven't seen how I'm using it. And don't go chirping that I should only do things your way, it's none of your god damned business what I'm using it for.
Further, if JSON was really only data, then it's an incredibly stupid way to store data, given that it has a human-readable syntax that the computer can only deal with after it's been parsed. As data, it's bloated and inefficient. To the extent that JSON is a good format, it's code. To the extent that it's data, it's not a good format.
A fork can be a spoon for you, if you choose to use it that way. Nobody is telling you what you are supposed to use it for, but still JSON was designed as data format.
If you don't like the format or feel that JSON is too restrictive/bad feel free to extend it or create your own format from scratch.
While I don't think that comments belong in JSON, I don't agree JSON is designed as "data and not code" format. Trees of tokens are actually the natural format for writing code (also known as Abstract Syntax Trees, AST) and the data/code distinction is really, really blury when those two meet together, so it's only to be expected that people will end up coding in JSON (what are the 'build definition' files for various build tools / package managers, if not very simple programs)?
You can use a screwdriver as a hammer all you want, it's not going to make it a good idea. This isn't a free speech issue.
> Further, if JSON was really only data, then it's an incredibly stupid way to store data, given that it has a human-readable syntax that the computer can only deal with after it's been parsed. As data, it's bloated and inefficient.
So use something else. Also, a computer can only read any file after it's been parsed in some way. I'm not really sure what you're suggesting as an alternative.
> To the extent that JSON is a good format, it's code
It represents groups of more-less arbitrary tokens as trees, therefore it's a natural format for code representation as it's equivalent to an AST, therefore it's trivial to attach a basic execution context with if and lambda defined, and now it's executable and turing-complete.
Maybe a more useful resolution to this would be to state that while all code is data, no data should be code?
You could, if you were crazy enough, write perfectly valid JSON that passed the values to eval() or a parser or what have you. And while there are encodings in JSON that don't work in javascript (i've broken JS innumerable times trying to get that to work) JS does of course allow you to add closures as an object, or an array, whatever you like, and some forms of valid JSON (if not all) are also valid javascript. So you could indeed use JSON to compute things if you wanted to.
Yes, I understand that "code is data". This does not mean that data, in general, is code; unless you are willing to make the words completely meaningless. "Code" requires some notion of an execution platform/environment, which does not exist for arbitrary data. Here is a string: "the quick brown fox jumps over the lazy dog". Or how about "\u0000\u0000". That is not code, as generally understood.
> "Code" requires some notion of an execution platform/environment, which does not exist for arbitrary data.
Arbitrary data don't exist without some notion of an execution (or interpretation) platform.
We tend to use "code" as a word for "commands telling some execution process what to do" and "data" as a word for "information that is meant to be transformed" but in reality this distinction is meaningless; both are fundamentally the same thing, and even our "code" vs. "data" words have blurry borders. It's very apparent when you start reading configuration files. For example, aren't Ant "configuration files" essentially programs[0]?
We all know what we usually mean in context by saying what is "code" vs. what is "data", but one has to remember, that in fact they are the same - minding it leads to insights like metaprogramming. Forgetting about it leads to dumb languages and nasty problems, and is generally not wise.
Also I recommend watching http://www.youtube.com/watch?v=3kEfedtQVOY to learn how what would be data, as defined by formal grammars of some real-world protocols, can - by means of sloppy grammars and bad parser implementation - cross the threshold of Turing-completeness and become code.
I understand all this. Like many people, I've written programs in C++ templates. But I think we're talking past each other because you want to make a pedantic point. I'm using the words as they are generally understood, not in a technical computer science way. I'm talking about first-level stuff, not metaprogramming. Let me give you some questions to ponder:
- Is the text of Hamlet code?
- Was it code as soon as Shakespeare wrote it?
- If not, did it become code once the electronic computer was invented? Or did that happen once a version was stored in a way accessible to an electronic computer?
- Did all the existing paper copies immediately become code at that point as well?
> But I think we're talking past each other because you want to make a pedantic point.
I guess that's true.
The flavour of "code vs. data" discussion in this thread was one of representation formats. You could argue that when looking at works of art from past centuries one should immediately say "data!" [0]. But in case of JSON, a format suspiciously almost identical to Lisp in structure, one needs to be careful in saying "it's for data, not for code".
Actually, I'm not sure what kind of point I'm trying to make, as the more I think of it, the more examples of things that are borderline code/data come to my mind. Cooking recipes is the obvious candidate, but think about e.g. music notation - it clearly feels more like "code" than "data".
I feel that you could define a kind of difference between "code" and "data" other than in intent, something that could put bitmaps into the "data" category, and a typical function into "code" category, but I can't really articulate it. Maybe there's some mathematical way to describe it, but it's definitely a blurry criterion. But when we're discussing technology, I think it's harmful to pretend that there's a real difference. Between configuration files looking like half-baked Lisp listings and "declarative style" C++ that looks like datasets with superfluous curly braces, I think it's wrong to even try to draw a line.
[0] - there's a caveat though. "How to Read a Book" by Mortimer Adler[1] discussess briefly how the task of a poet is to carefully chose words that evoke particular emotional reactions in readers. It very much sounds like scripting the emotional side of the human brain.
I do not presume to know who you are, or what you have accomplished, but there are few people with the professional and academic background that qualify to be able to call Douglas Crockford "stupid".
>I do not presume to know who you are, or what you have accomplished, but there are few people with the professional and academic background that qualify to be able to call Douglas Crockford "stupid".
Why, who do you think Douglas Crockford is and what is his "academic background"? He doesn't even have a related degree. Most of his JS fame he ows to his book.
Don't put words in my mouth, I didn't claim he was "stupid." To say that one thing he said somewhere is stupidity is a far cry from claiming he is stupid.
I also never said that "opinionated design is stupid".
Perhaps you could rephrase your question in such a way that you aren't presuming to speak for me.
JSON isn't a configuration language, it's just another data encoding format with the added benefit of being readable by humans. That and its ubiquity make it an appealing choice for stuff like ad-hoc configuration at first glance, but it's not the best choice. If you want a config language for shared human and machine consumption, use one designed for that purpose. JSON is pretty much just an encoding that is easy for humans to inspect and debug.
This. I've worked with a number of systems that "use json as the configuration language"; and in every case it's led to issues.
Given a choice it's better to have a .ini style format like the one that pythons ConfigParser will digest.
That way you can have sections, comments and you won't be tempted to have the application write things into the configuration on it's own...
I'm sure there are counter points to what I'm about to bring up, but three observations:
1. In my experience JSON is frequently output programmatically, and taken in programmatically. Comments are not useful in these cases.
2. The only time comments could be perceived as useful then would be when parsing JSON by eye or hand. However, it is not difficult to parse JSON and understand it unless the keys have used obfuscated names. If key naming is obfuscated, comments aren't really the correct solution.
3. "An object is an unordered set of name/value pairs", as mentioned by jasonlotito and others earlier. There is no guarantee that a JSON parser will give you the right value if there are two of the same keys in the same scope.
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
The consequences are undefined, I feel, for a reason. You can't put them all down on paper, it depends on what all the parsers do. The parsers can accept or reject things with duplicate keys, or they can play a nice little ditty through the speakers.
All it means is a parser isn't required to reject JSON with multiple keys. It can, however, do whatever the fuck it wants with them.
If the wording was precise, then it should be a MUST. SHOULD indicates a terrible world of unknown consequences.
I know there is a lot of JSON handling that happens behind-the-scenes, but there is also a non-trivial amount of JSON that I have manually created and/or altered, and have to share with a team.
It's a blessing and a curse, these modern NodeJS projects -- it's awesome that I can simply create/modify a .json file with a few properties, run a command, and magic happens. However, if I want to try and communicate out the intent of the values to my team of 20+, it becomes really convoluted. The projects all magically work by looking for foo.json, but if I comment that file then it breaks.
So I have to create another foo.comments.json, add another script that will remove the comments and then call the original instructions. Then I need to create additional documentation instructing the team to ignore the developer's docs regarding native use, and to run the application with our own homebrew setup.
It also can make testing a pain in the ass, because now I can no longer comment out values, I have to remove them completely. Not a huge deal, annoying nonetheless.
Right, we're suffering from people using JSON for config files just like a few years back a lot of projects suffered from using YAML for config files (though YAML was at least designed to be human editable ... ingy and I regularly disagree over whether he succeeded :).
For the past few years, I've generally been using either apache-style via http://p3rl.org/Config::General or some sort of INI derivative (git is proof that ini is good enough for a lot more things than you might expect).
For the future, ingy and I have been working on http://p3rl.org/JSONY which is basically "JSON, but with almost all of the punctuation optional where that doesn't introduce ambiguity" - currently there are perl and ruby parsers for it, javascript will hopefully be next.
Admittedly, we -haven't- got round to defining a format for comments yet, but my point is more "JSON wasn't really designed for that, let's think about something better".
Why not adding an object field with identifier a_comment:"blabla..."
The advantage I see in this way of commenting is that the comment becomes accessible inside the program instead of being stripped off by the parser. For the human reader it's also more obvious.
Unfortunately, it's not possible to add comment to anything else than objects. But the OP's proposal as well.
Why have comments in code at all, then? You could always just make a variable/constant, with the added benefit that the comment becomes accessible inside the program...
But that makes no sense at all to me. I agree that using comments as metadata/directives is typically an antipattern hack, but what about for non-metadata comments? Embedding comments into code is just as ass-backwards as embedding code into comments. Neither is right.
> For the human reader it's also more obvious.
Strongly disagree here -- if I open a file that I've never worked in before, I have faith that the comments were meant specifically for me. Likewise, I assume all code in the file is not for me (on account that I'm not a compiler/interpreter/etc.).
Given the RFC says "The names within an object SHOULD be unique", there's nothing stopping me from writing a parser that takes the first name/value pair and throwing all the others on the floor. Or even better, picks a random name/value pair when the same name appears. Both of these behaviours are allowed by the RFC, and would break this hack.
Putting comments into JSON in this way is a hack and shouldn't be used by anybody who has any interest in writing maintainable software. Relying on ambiguities in an RFC and someone saying "JSON parsers work the same way" is a good way to end up with a really obscure bug in the future.
At least in ECMA-262 5, Ch. 15.12.2, there is a NOTE: "In the case where there are duplicate name Strings within an object, lexically preceding values for the same key shall be overwritten."
Assuming you mean RFC 4627, you're quoting the restrictions on what character streams can be called "JSON". The "should" means that if your names are not unique you can still call it "JSON", but you should think twice about it.
The parsing behavior for JSON is not defined at all in RFC 4627, actually. Browsers (and Node, since it's using a browser js engine) use the parsing specification in ECMA-262 edition 5 section 15.12.2.
Note that ES5 section 15.12 in general is much stricter than RFC 4627, as it explicitly points out if you read it.
This is misguided. You don't need comments in a JSON config file. Why? Because you don't use JSON for config files that need comments.
JSON is like duc(k|t) tape. It's really easy to stick two things together with it. That doesn't mean you always should. It's the simple thing that gets the job done so you can focus on what matters.
One shouldn't pick JSON for your config files and then hold it up as good design. "Look at me, I'm daring and _not using XML_!" Using JSON is crap design, but good engineering means sometimes picking something crappy and not wasting effort on things that don't matter in the end.
If your configuration files become both complicated and important enough that you need comments, then you should stop using JSON. If your duck tape job starts needing additional reinforcement, then you should probably just get rid of the duct tape and do it right.
If one of your requirements is a sufficiently trendy yet commentable config language, look into YAML. Also, gaffer tape. The white kind is easier to write on.
If crap design like JSON is the right engineering choice sometimes (and I agree that it is), that seems like an argument that adding comments in this crappy way may sometimes be the right engineering choice.
Yeah maybe you don't use JSON for config files that need comments, but that's because there's no documented way of how to put comments in JSON. The article solved the problem.
Actually, I'm 100% playing the devils advocate here. I'll even flip-flop to prove it. Regarding the article, I doubt that every JSON parser will let this slide. To me that's an even better reason to avoid this practice.
> Regarding the article, I doubt that every JSON parser will let this slide. To me that's an even better reason to avoid this practice.
If someone uses undefined behaviour in config files for the sake of storing a comment, I reserve the right to hunt them down if I have to maintain their code.
3. SHOULD This word, or the adjective "RECOMMENDED", mean that there
may exist valid reasons in particular circumstances to ignore a
particular item, but the full implications must be understood and
carefully weighed before choosing a different course.
Salient point is that you would need to ensure that you are only using JSON parsers that tolerate duplicate names (and use the last value)
> Salient point is that you would need to ensure that you are only using JSON parsers that tolerate duplicate names (and use the last value)
To drive this home a bit more forcefully, it requires knowing the behaviour of your parser where it is marked as "undefined" in the spec.
If that isn't enough to stop you, DON'T USE JSON. A patch level change in a library could break your code in a non-obvious way and it would be your fault. If you want comments, DON'T USE JSON, JSON DOESN'T HAVE THEM.
And the big point here is that the members of the RFC group were considering breaking the EcmaScript standard and change it to MUST which would break existing programs and the "workaround" in the article.
This hack, while nice, is still just a work around. I highly recommend that if you can, in as many places as possible use YAML instead of JSON.
JSON works great for on the fly communication with frontends that are running JavaScript, or for communication between JavaScript processes like Node.js servers. But for configuration files and other things that need comments YAML is many times better, both for it's clean, Markdown reminiscent structure, and its native comment support.
Node.js has a great module called js-yaml (https://github.com/nodeca/js-yaml) which automatically registers handlers for .yml and .yaml files, allowing you to require them in your Node.js code just like you can with JSON files.
It also comes with a YAML parser for the browser side of things, so if you want you could even communicate YAML directly from the server to the client side, although frankly I don't see much advantage to sending YAML over the wire instead of JSON. (And as others have mentioned below untrusted YAML sources could insert malicious objects in YAML, so I wouldn't recommend this technique.)
That's not really a problem with JSON though is it? Anything you run through eval() is a disaster in the making. Maybe the problem is that people are trying to make data formats too powerful, and too many things seem to be creeping towards Turing completeness that don't need to be.
I think parsers for JSON and Yaml, INI etc should be designed in such a way as to make it impossible to assign anything like an object, class, function, etc. Numbers, strings, and collections of numbers and strings... that's all you should get (though obviously "string" is frought with peril.) Anything more is unnecessarily complex.
It is a problem with JSON in the sense that it's a JavaScript subset, 'in practice' - modulo the Unicode support that goes beyond JavaScript. So it's to be expected that eval() will be used as a convenience by developers, ignoring the security implication that comes will eval() hoisting full JavaScript.
The way to have avoided the issue would have been for JSON to have a grammar that broke eval(). But one could argue the ability to pass JSON into eval() to get JavaScript is one of the reasons JSON became popular to begin with.
YAML is easy to type, even with the whitespace. So is INI. And as verbose as XML is, it's easier, ime, to type than JSON. Of those four, JSON is the hardest to write by hand; certainly it's the one I make most mistakes with, to extent I have a particular technique for writing it out (prefixing the commas). As a result JSON as a config file format is tedious, verbose, and error prone; its sweet spot is a machine interchange format that a human can debug/read if needed.
I've actually never developed anything serious in Rails. I just don't like the framework, and the performance of Rails leaves a lot to be desired in my opinion. I'm a 100% Node.js convert these days.
But I do like the Rails convention of using YAML format and have adopted that in my own code as much as possible.
Yeah, I had read about that. One more reason not to send YAML over the wire. YAML makes great sense for your internal configuration files and internal data structures where you need comments and readability. YAML is perfectly safe here because chances are you aren't going to be exploiting yourself by putting malicious objects in your YAML.
But for over the wire communication, JSON makes more sense than YAML, not only because parsing unsafe YAML from an untrusted client could cause exploits like you mentioned, but also because YAML is dependent on indentation and line breaks, and therefore makes communication with the client side much more awkward than just sending JSON to the client or receiving JSON from it.
I believe the parent was referring the many recent YAML based vulnerabilities found in Rails (and elsewhere). He is basically saying, "You can use YAML -- if you don't care about injection vulnerabilities."
In my experience, YAML is better for configuration files and human edited files. JSON is better for data and communication between computers. The features that make YAML easier to write (comments, more flexible format, less quoting) make it more complex and slower to parse.
Also, many of the security holes in YAML come from its use as a serialization format which can represent native classes. I wish the YAML parsers had more explicit support for simple data schemas which would reduce the security risk and be sufficient for most configuration files.
Ironically, YAML has object serialization features out the wazoo and JSON for that purpose is relatively more spartan. I will never understand why that happened the way around it did. YAML should have been left at human readable with none of the object serialization stuff thrown in.
While on the topic of encodings (I'm a huge encodings geek), let me plug a new one we recently discovered called Space (https://github.com/nudgepad/space). It is dead simple and has the nice feature that it is extraordinarily easy for both humans and machines to read and write.
It is definitely very minimalist. Personally I have issues parsing it visually though, because the indentation of only one space makes it hard to differentiate inner data structures particularly on a large screen with small fonts. Additionally the lack of a division character other than space between the key and the value makes reading each key value pair much harder because the key and value tend to run together visually.
YAML is excellent for resource files, i.e. human editing complex data.
For -configuration- you want a simpler format; INI is worth considering, as is http://p3rl.org/JSONY which is ingy's implementation of a vision we thrashed out for a more sysadmin-friendly config format.
I agree it is a cute hack, but it is also kind of horrifying. You are depending on an undocumented behavior that happens to be shared across the ecosystem. Now what happens if that file hits a parser which takes the first instance, or a functional one that errors out when it sees multiple assignments?
It is dependent on a specific indentation format which is one thing I dislike about it. But if you configure your vim or whatever editor you use to properly indent YAML files you should have few issues with fragility.
Even with indentation problems, the time saved in not typing curly brackets, extra quotation marks, and commas, and the time saved in not having to visually parse these when reading YAML more than makes up for the occasional data structure bug caused by bad indentation.
My favourite example of dealing with undefined behaviour is this:
In practice, many C implementations recognize, for example, #pragma once as a rough equivalent of #include guards — but GCC 1.17, upon finding a #pragma directive, would instead attempt to launch commonly distributed Unix games such as NetHack and Rogue, or start Emacs running a simulation of the Towers of Hanoi.[7]
It's overwriting existing keys, which is fine imo. When I use a map in any language and put a new value with a new key, expected behavior is that the previous key is overwritten.
My first thought in seeing this was that objects aren't guaranteed to maintain order: "An object is an unordered set of name/value pairs" - http://www.json.org
> it's up to the parser to keep clobbering a value every time a new value comes in for a given k
Nope, parsers are perfectly in their rights to do whatever they want with multiple keys. They could read them backwards, sort them, whatever. The behaviour in the instance of multiple keys is undefined.
> This seems like a bad idea.
It is an astonishingly bad idea. I'm concerned by it being so high on the page.
> But hey, might work well for the original author.
Depends on their parser. It's undefined behaviour according to the spec. It might work now, but I'd argue it doesn't work well, as a patch level change could bork this.
I'm not so sure. I think, JSON falls back to the ecma script standard for specific details. The object initializer semantics seem to force a left to right evaluation order, in the ecma spec around page 65. I'll admit my claim was unfounded when i made it, and I only went to the spec to avoid being wrong :) If I were to implement a JSON parser, I would now feel obligated to eval in order, due to my reading of the spec.
However, I think we wholeheartedly agree, don't rely on this behavior. It is an outright strict mode error.
streaming parsers can't follow the assumption short of becoming useless. They're either going to send only the first instance or going to send two different events.
This sounds great until some parser uses the comment definition instead of the value. Is it defined in the spec that parsers need to use the last defined value for a key?
Since the order of an object's keys is not guaranteed, it seems like even if a parser respected the last-defined rule, you could still potentially end up with the wrong field last.
Not really defined, but since an object is defined as an unordered collection of key/value pairs, a conforming parser could probably shuffle the pairs before parsing them.
I suppose it could, but the point of the object being defined as an unordered collection is because the most straight-forward way of implementing this is through a hash table, where the order of the keys cannot be guaranteed without additional work. I'm sure they didn't consider a parser randomly permuting the lexical order of the pairs as something a sane person would do.
Can we all just agree, as a community, to add comment support to our JSON parsers? Hell, I'd do a PR on V8 if I knew C++.
It's ridiculous that I can't document notes on dependencies in my NPM package.json, or add a little reminder to my Sublime Text configuration as to why I set some value, because we're using JSON parsers that can't handle the concept of ignoring a line with a couple slashes prefixing it.
IMO - either we add comments to JSON, or we stop using it for hand-edited configuration.
> It's ridiculous that I can't document notes on dependencies in my NPM package.json, or add a little reminder to my Sublime Text configuration as to why I set some value
Crockford's rationale for not supporting comments is that people use them to add meta data to the object (e.g. type annotations) which makes it hard to consume with different parsers.
is the kind of things this forbids. Most peoples parsers would ignore this as a comment (if we have comments), but maybe some would do it in a special way. Either everyone ignores the comments in the parser (this is unlikely to carry on for long, someone will want to extend it) or nobody is allowed comments. That way everyone parses the same text.
"Trusting the community to do the right thing is better than handicapping your users."
And the "community" in question had repeatedly and grossly demonstrated itself to be unworthy of such trust.
Crockford was not hypothesizing that this might happen, he'd seen it. Repeatedly. If you want to argue against it even so, fine, but bear in mind that is what you are arguing against, real pain that real people experienced, not mere possibilities.
JSON did support comments for a time and people were starting to use it for meta data.
The problem is that JSON is not meant to be used a configuration file format and just because it's really good for information exchange, doesn't mean it's good for configuration (and vice-versa). Configuration really requires comment support and information-exchange is better avoiding it. Two standards are needed.
right - but that is valid syntax! any json parser can understand that, and that's what he recommends doing instead. But if you're doing this in comments, you end up writing your own mini language to describe your annotations, and nothing else knows how to parse it. that should clearly be avoided.
If JSON had comments, then of course any JSON parser could understand those comments just as well as they can currently understand "_type": "int". What am I missing?
That because they're comments specific JSON parsers could (and likely would) interpret processing instructions embedded in those comment to toggle behaviors on the fly. Crockford's fear (founded I think) was that comments would be used to "extend" json.
a JSON parser will always know how to represent that object. How you process that object is up to you.
However, when you write:
{
// @type int this is a comment
"foo": "123"
}
and you call JSON.parse(), what would you expect to get back? You can no longer represent it as a simple object, you need some way to access the comment, how do you do that? Moreover, whose responsibility is it to process the annotation in that comment? the parser's? Should you get back an integer rather than a string for obj.foo? how would you support different types of annotation? What happens if you're using parser A and your client uses parser B? Does parser B support all the annotations that parser A supports? If you need to modify a JSON structure, e.g. JSON decoding, adding a property and re-encoding, should the comments be preserved? ...
You can see that having comments introduces a whole host of other questions, ambiguity and would only make it harder for different platforms to share data. Avoiding this kind of cruft is why JSON is winning vs XML for most things these days.
In the eyes of compliant parser (assuming JSON supported comments) it is just a comment like "_type": "int" is just a key-value pair.
However, when using ad hoc parser, then all bets are off what the result is in both cases again, not just the comment case. Regardless of comment support in JSON the same problem appears to exist.
Sublime Text actually already supports '//' comments in its "JSON" configuration files, though it's non-standard. The comments are properly ignored, and syntax-highlighted. However, the comments (along with all other manual formatting) are lost if the file is programmatically edited, for example by changing the font size using the keyboard shortcuts.
XML sucks in large part not because of XML but because people used it for everything, everywhere in places it was highly ill-suited. Don't fuckup JSON the same way.
Funny story. JSLint[1] does not approve of this technique.
I asked Crockford to implement the duplicate check in April 2009 via email. 20 minutes later, out of nowhere, he was done implementing that check and wrote back "Please try it now."
This guy is fast. Especially nice considering we do not know each other at all.
I sent him an email once asking for the same JSLint license that he gave to IBM (you know, the one without the "do not use this for evil" clause.)
He responded that he was getting annoyed by everybody asking for this, so it was going to cost me $100K to obtain such a license.
I responded that I only asked for that license in order to annoy him (and thanks for the confirmation that it worked), because his immature license clause is annoying everybody else.
"comment":"this is a comment";
"value": 45;
"comment":"this is also a comment";
"value2": 64;
"comment":"we like overloading the comment field";
"stringval":"but these stay the same";
}
I would agree it's very hackey and probably not a good idea since the spec is liable to change. But I wouldn't be sad if the spec were changed to allow for this, or to allow for comments.
I'm biased, back in the BeOS -> Haiku days, I was wanting some sort of configuration textfile that would neatly be able to be parsed into a BArchive object (and presumably transmitted into a BMessage). XML was all the rage at the time, so I wrote for myself a sort of XML-ish format, but I never contributed it to the tree. I learned the problems with XMLs (should it be an attribute or an innerText?). I wanted something with a bracket notation, but JSON had not been discovered by crockford yet, if it had been I would have gotten more involved and tried to have it be adopted.
Terrible spec-violating hack aside, the idea of the author soliciting upvotes on StackOverflow doesn't sit well with me. I'd hate for SO solutions to become diluted by answers from users who are 'marketing' for upvotes.
It is a 'hack' as discussed in the article and I will probably never use it. JSON should be either self explanatory or documented, I don't see any reason why you would add this unnecessary clutter to these messages.
It is already hard to read as is and it's making it worse to read and confusing, if some big service would start using this, you would have to know about this 'hack' otherwise he would have to look up what the hell is going on.
Also, this is the same information for each call and thus redundant, makes your messages larger when an advantage of JSON is that it's generally a small message.
This, to me, looks like an example of relying on a nondeterministic implementation. To my knowledge, the standard doesn't prescribe that parsers take the second/last of a duplicate key. As a result, this is relying on implementation-specific choices which can lead to a terrible upgrade process.
Switch to a different JSON parser, does it still work? probably. but I wouldn't bet that much.
If I were implementing a JSON parser, might I throw an error on a duplicate key? maybe. Maybe I would just print a warning?
If I were every going to give someone advice it would be to never do this.
You should use standard JS comments and process them out. Douglas Crockford's offical answer on comments, https://plus.google.com/118095276221607585885/posts/RK8qyGVa.... Essentially just process them out beforehand with something like jsmin, pretty straightforward.
This is a horrible hack. You should use JSON-LD [1] to describe the fields of your JSON. It's a W3C standard!
Also, it's not defined in the JSON standard in which order an implementation needs to parse the JSON fields/keys. So you could end up with potentially wrong results!
This is a nice trick, but probably only should be used in systems where the set people touching the code is a limited, rarely-changing set of people and anything using the JSON is strictly going to treat the last defined value as the value to use. Dragons lurk elsewhere!
if I ever saw this in a project, I would remove those comments in a heartbeat. The behavior here is specific to the json parser. JavaScript is not the entirety of programming.
This is a celebration of programmers' ability to generate unmaintainable code by exploiting implementation dependencies. People get fired for pulling this horseshit every day!
A common practice in config files is to comment out whole sections e.g. optional proxy server settings. This sort of multi-line comment is not addressed by this hack