Thanks for checking it out! All those broken links in your “django signals” seem...

Thanks for checking it out!

All those broken links in your “django signals” seem to have come from a page full of mangled URLs that got picked up on; unfortunately they’ve pushed the actual results all the way down to page 6! I definitely need to give a boost to official documentation.

“golang cobra” gets what appears to be the official repo as the first result; but it’s clearly not really getting what you’re going for here. This is a good example of the sort of challenges a search engine faces: both “go” and “cobra” have multiple meanings, and it needs to understand the context to figure out whether a given link is relevant for this particular search. I think something like a vector search would be useful here but I haven’t looked into setting something like that up yet.

GitHub is on my list, but it’s very big and is going to require careful optimization. (Even if I only load top-level READMEs it’s still a ton of data.)

ReadTheDocs would be great, but they don’t seem to have any dump/download support, or even a list of all the documentation sites they host, so in lieu of that they’re going to have to wait until I get a general web crawler.

I have some heuristics to collapse multiple versions into single result with a version picker, but they require some adjustments to the rest of my data processing pipeline which I haven’t gotten round to yet.