Hacker News new | past | comments | ask | show | jobs | submit login

That's our secret sauce :)

We've built out a decently complex pipeline for this, but a lot of the magic has to do with the specific embedding model we've trained to know what text is relevant to feed in and what text isn't.




This is a really cool tool. Have you considered filtering known blog-spam/low-quality content mill/SEO'ed garbage type sites (ie: GeeksForGeeks, W3Schools, TutorialsPoint)? That would make me definitely jump on this, and even pay for a subscription. I spend way too much time having to scroll down Google past all this junk before I hit the official documentation for module I'm using.


we do some filtering ourselves, but you can specify your own custom filters at https://phind.com/filters


This is great, going to see how this fares tomorrow as a replacement for Google.


If you use duckduckgo there's the ddg-filter firefox plugin that lets you block domains. I use it to block exactly the low quality domains you mention.

Maybe there are similar plugins for other search engines as well...



i don't think they really need to...maybe for citations but for training if the content is the same on site A and B it doesn't matter which one it pulled from.

that said.. if the content itself is bad then that'd be a problem. we'll probably start seeing that, sites designed to poison LLMs.



Is this website satire or an honest/evil attempt to poison the well?

Oh....I see, at the bottom, says satire specifically. Or rather "sAItire". Cute.

Didn't waste any time putting that up.


You can always remove your hated sites on Google search as well. For example:

Python list -w3schools

It will not include links contain the text


I know, it's just irritating to have to do that, or have an extension do it. I would be happy to support a search engine that lets me filter out unwanted crud.


Any pointers on how to build custom embedding ? I am working on a specialized ___domain where words may mean different things than rest of the world. I want to create my own embeddings, which I suspect would help. Any pointers ?


Doesn’t ChatGPT bring that through plug-ins? Also bing chat




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: