Hacker News new | past | comments | ask | show | jobs | submit login

A topic I rarely see (at least in depth, or in discussion) in stories/blogs about APIs is ID generation, i.e. best practices for creating access_token, resource ID and especially handling collisions.

The Flickr blog post [1], which featured a simple implementation of a ticket server, is so far one of the easiest and most secure to use.

Personally I don't feel comfortable using "hash of i++", "hash of ...", "i++", as they all "fall apart" when you need ids of different specifications (e.g. a 8-char ID, a 24-char token ID).

Anyone have two cents on this? I have a difficult time imagining that larger APIs just "live with" the chance of collision (even though it can be very low) - they must mitigate it somehow, right? combined probability (such as the nonce check and tokens on oauth1 requests)?

Actual ticket servers (e.g. [2], [3]) either introduce added complexity (and possibly latency) or needs a secondary code base. Regardless, the issue of scaling and latency is rarely touched upon, so if anyone have some input here, I'd greatly appreciate it.

EDIT: I've found two interesting links on the problem, [4] [5] (snippit'ed here [6]).

[1] http://code.flickr.net/2010/02/08/ticket-servers-distributed...

[2] https://github.com/twitter/snowflake

[3] https://github.com/boundary/flake

[4] http://boundary.com/blog/2012/01/12/flake-a-decentralized-k-...

[5] https://blog.twitter.com/2010/announcing-snowflake

[6] https://github.com/antirez/redis/pull/295#issuecomment-46734...




I like how MongoDB does this:

http://docs.mongodb.org/manual/reference/object-id/

They start with a timestamp and are fully distributed. In the official linux client library, the "3 byte machine id" is the first 3 bytes of the md5 hash of the hostnames. As long as you can guarantee uniqueness there, there are no collisions.

For a large API, one could generate their own machine ID's and provide a strong guarantee on uniqueness.

The one criticism with MongoDB's approach is that the ID is 12 bytes and doesn't fit very well in column types offered by other DBs




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: