Hacker News new | past | comments | ask | show | jobs | submit login

I've done similar.

Bayesian filtering seems to work great on HN headlines. I trained mine with about two years of data scrapped from http://www.daemonology.net/hn-daily/ (apologies to whoever runs that), then just basically wrapped that with some code that grabs hacker news's main page and displays the filtered headlines to me. It nails politics and startup crap with stunning accuracy.

The only problem is that now I find myself using both that system and the website itself.




Anywhere I can get that training set without DDOS'ing some poor guys mirror?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: