Hacker News new | past | comments | ask | show | jobs | submit login

Thanks for checking it out!

All those broken links in your “django signals” seem to have come from a page full of mangled URLs that got picked up on; unfortunately they’ve pushed the actual results all the way down to page 6! I definitely need to give a boost to official documentation.

“golang cobra” gets what appears to be the official repo as the first result; but it’s clearly not really getting what you’re going for here. This is a good example of the sort of challenges a search engine faces: both “go” and “cobra” have multiple meanings, and it needs to understand the context to figure out whether a given link is relevant for this particular search. I think something like a vector search would be useful here but I haven’t looked into setting something like that up yet.

GitHub is on my list, but it’s very big and is going to require careful optimization. (Even if I only load top-level READMEs it’s still a ton of data.)

ReadTheDocs would be great, but they don’t seem to have any dump/download support, or even a list of all the documentation sites they host, so in lieu of that they’re going to have to wait until I get a general web crawler.

I have some heuristics to collapse multiple versions into single result with a version picker, but they require some adjustments to the rest of my data processing pipeline which I haven’t gotten round to yet.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: