Cool idea. Anyway, I created a Top Stories Directory 2 days ago too. It will record down every number one stories starting from 2 days ago. But, it lost tracking some stories due to some bugs. But, I already fixed them all.
I scrape only the front page once every ~4hrs (minimum)
Nearlyfreespeech.net doesn't have cron yet, so I'm using onlinecronjobs.com. If you want to scrape it on demand just go to http://hnweekly.watdahel.com/index/scrape
It's not 100% accurate, but it's close. The reason I did the website is so I can catch up with the good stories that were submitted earlier during the week (I usually read during weekends).
yep it's sorted by points. reddit already has this: http://www.reddit.com/r/programming/top/?t=week
not sure about the other sites, I'm just usually on hn, reddit and stuff on my google reader
ö doesn't need Unicode. It's ISO-Latin-1 (ISO-8859-1). An update is ISO-8859-15. Latin-1 is an 8 bit character set including ASCII, covering the Western European languages. I don't know PHP, but it could be the default or a simple character set option.
I have a screenscraping thing in place that targets an ASCII only environment (LambdaMOO mud). Manually replacing unicode stuff like this has been a pain in the butt. Luckily that was all just for fun and didn't need to be perfect.
Is there a good library out there (in any language) that does good unicode --> ASCII substitutions for major languages?
It would be nice to be able to see the top stories over any arbitrary time period as well. I imagine an interface with a draggable timeline to specify the period (something similar to what google finance uses, perhaps) could be even better.
http://hn.siong1987.com