Hacker News new | past | comments | ask | show | jobs | submit login

So to be clear:

    - You have 500M rows x 50 columns of stats for each row.
    - For any stat, you want the ability to display the top 10M.
    - In under one second.
Is that correct?

If so, I think the problem is not clearly described. What does it mean to display a result row? Are you displaying the stat value and a primary key? The entire row? What exactly does it mean to display 10M rows? Are they all presented on one web page? Or is it N at a time with prev/next links? Or is there random access within those 10M (e.g. find me in the ranking for stat #27). Do you overlap retrieval with the display process, (showing the first rows when available)? When someone clicks prev or next does it recompute the entire result?




Pretty close. 10M rows for each stat -- this number grows linearly with the number of users. There are two cases for showing stats. The first is a leaderboard: random access (because of how we allow users to scroll through the leaderboards) where you'd be fetching 100 consecutive records from ranking 0 of that stat to rank N where N is the total number of rows for that stat. The second case is where you want to get the rankings for all stats for a particular user.

As for how the data is returned - whatever is fastest but ideally the response is sent back as JSON so it should all be available by the time it is sent back.

As for caching, I think it would be ok to cache the data for a short period of time, though I don't know how efficient it would be to cache 10 million rows at once (per stat.)


Scrolling is not random access. Scrolling means like start at the beginning, and keep going, stopping eventually. Rows are provided in a fixed order. Random access means jumping immediately to any row at any time.

All rankings of one user is an additional requirement, not mentioned in your first post.

I wouldn't necessarily rule out caching, but it isn't yet obvious that it is needed.

This is probably not the best way to discuss your requirements and possible solutions, going back and forth on HN. I consult if you'd like to talk further.


Yeah I suppose I haven't done a great job outlining all of the requirements. Is there a way we can get in touch? Not sure we're looking to hire a consultant yet but it would be good to have as an option if we decide to go that route.


Sure, I'm jao at geophile dot com.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: