That's our secret sauce :) We've built out a decently complex pipeline for this,...

icepat · on April 12, 2023

This is a really cool tool. Have you considered filtering known blog-spam/low-quality content mill/SEO'ed garbage type sites (ie: GeeksForGeeks, W3Schools, TutorialsPoint)? That would make me definitely jump on this, and even pay for a subscription. I spend way too much time having to scroll down Google past all this junk before I hit the official documentation for module I'm using.

rushingcreek · on April 12, 2023

we do some filtering ourselves, but you can specify your own custom filters at https://phind.com/filters

icepat · on April 12, 2023

This is great, going to see how this fares tomorrow as a replacement for Google.

jpoesen · on April 13, 2023

If you use duckduckgo there's the ddg-filter firefox plugin that lets you block domains. I use it to block exactly the low quality domains you mention.

Maybe there are similar plugins for other search engines as well...

gdprrrr · on April 13, 2023

I use uBlock Origin for that https://github.com/quenhus/uBlock-Origin-dev-filter

8n4vidtmkvmk · on April 13, 2023

i don't think they really need to...maybe for citations but for training if the content is the same on site A and B it doesn't matter which one it pulled from.

that said.. if the content itself is bad then that'd be a problem. we'll probably start seeing that, sites designed to poison LLMs.

Thorrez · on April 13, 2023

>sites designed to poison LLMs.

https://raisistance.com/how-to-prevent-sql-injection-attacks...

8n4vidtmkvmk · on April 14, 2023

Is this website satire or an honest/evil attempt to poison the well?

Oh....I see, at the bottom, says satire specifically. Or rather "sAItire". Cute.

Didn't waste any time putting that up.

pknerd · on April 13, 2023

You can always remove your hated sites on Google search as well. For example:

Python list -w3schools

It will not include links contain the text

icepat · on April 13, 2023

I know, it's just irritating to have to do that, or have an extension do it. I would be happy to support a search engine that lets me filter out unwanted crud.

mrg3_2013 · on April 12, 2023

Any pointers on how to build custom embedding ? I am working on a specialized ___domain where words may mean different things than rest of the world. I want to create my own embeddings, which I suspect would help. Any pointers ?

moneywoes · on April 13, 2023

Doesn’t ChatGPT bring that through plug-ins? Also bing chat