A topic I rarely see (at least in depth, or in discussion) in stories/blogs about APIs is ID generation, i.e. best practices for creating access_token, resource ID and especially handling collisions.
The Flickr blog post [1], which featured a simple implementation of a ticket server, is so far one of the easiest and most secure to use.
Personally I don't feel comfortable using "hash of i++", "hash of ...", "i++", as they all "fall apart" when you need ids of different specifications (e.g. a 8-char ID, a 24-char token ID).
Anyone have two cents on this? I have a difficult time imagining that larger APIs just "live with" the chance of collision (even though it can be very low) - they must mitigate it somehow, right? combined probability (such as the nonce check and tokens on oauth1 requests)?
Actual ticket servers (e.g. [2], [3]) either introduce added complexity (and possibly latency) or needs a secondary code base. Regardless, the issue of scaling and latency is rarely touched upon, so if anyone have some input here, I'd greatly appreciate it.
EDIT: I've found two interesting links on the problem, [4] [5] (snippit'ed here [6]).
They start with a timestamp and are fully distributed. In the official linux client library, the "3 byte machine id" is the first 3 bytes of the md5 hash of the hostnames. As long as you can guarantee uniqueness there, there are no collisions.
For a large API, one could generate their own machine ID's and provide a strong guarantee on uniqueness.
The one criticism with MongoDB's approach is that the ID is 12 bytes and doesn't fit very well in column types offered by other DBs
The Flickr blog post [1], which featured a simple implementation of a ticket server, is so far one of the easiest and most secure to use.
Personally I don't feel comfortable using "hash of i++", "hash of ...", "i++", as they all "fall apart" when you need ids of different specifications (e.g. a 8-char ID, a 24-char token ID).
Anyone have two cents on this? I have a difficult time imagining that larger APIs just "live with" the chance of collision (even though it can be very low) - they must mitigate it somehow, right? combined probability (such as the nonce check and tokens on oauth1 requests)?
Actual ticket servers (e.g. [2], [3]) either introduce added complexity (and possibly latency) or needs a secondary code base. Regardless, the issue of scaling and latency is rarely touched upon, so if anyone have some input here, I'd greatly appreciate it.
EDIT: I've found two interesting links on the problem, [4] [5] (snippit'ed here [6]).
[1] http://code.flickr.net/2010/02/08/ticket-servers-distributed...
[2] https://github.com/twitter/snowflake
[3] https://github.com/boundary/flake
[4] http://boundary.com/blog/2012/01/12/flake-a-decentralized-k-...
[5] https://blog.twitter.com/2010/announcing-snowflake
[6] https://github.com/antirez/redis/pull/295#issuecomment-46734...