Will, Jeff, I am a BIG Exa fan. Congrats on finally doing your HN Launch.
I think NewsCatcher (my YC startup) and Exa aren’t direct competitors but we definitely share the same insight — SERP is not the right way to let LLM interact with web. Because it’s literally optimized for humans who can open 10 pages at most.
What we found is that LLMs can sift through 10k+ web pages if you pre-extract all the signals out of it.
But we took a bit of a different angle. Even though we have over 1.5 billion of news stories only in our index we don’t have a solution to sift through as your Websets do (saw your impressive GPU cluster :))
So what we do instead is we do bespoke pipelines for our customers (who are mostly large enterprise/F1000). So we fine-tune LLMs on specific information extraction with very high accuracy.
Our insight: for many enterprises the solution should be either a perfect fit or nothing. And that’s where they’re ok to pay 10-100x for the last mile effort.
P.S. Will, loved your comment on a podcast where you said Exa can be used to find a dating partner.
Thanks Artem! That makes sense to specialize for the biggest customers. Yes, a lot of problems in the world would be improved by better search, including dating.
It’s just breaks my head. We’ve build LLMs that can process millions of pages at a time. But what we give them is a search engine that is optimized for humans.
It’s like giving a humanoid robot access to a keyboard with a mouse to chat with another humanoid robot.
Disclaimer: I might be biased as we’re kind of building the fact search engine for LLMs.
I’m a YC founder who did 0 to 2M ARR in founder led sales with absolutely 0 sales background. I’m basically a self-learned coder who had to take CEO role, therefore doing sales.
I find this video about enterprise sales from Pete Koomen (YC Partner) to be the best summary:
I open-sourced pyGoogleNews and wrote a quick blog about how you can reverse engineer google news RSS to turn it into an RSS feed of any website that is supported by Google News
Wow, that's one of the most orange tag-rich posts I've ever seen.
We're doing a lot of tests with GPT-4o at NewsCatcher. We have to crawl 100k+ news websites and then parse news content. Our rule-based model for extracting data from any article works pretty well, and we never could find a way to improve it with GPT.
"Crawling" is much more interesting. We need to know all the places where news articles can be published: sometimes 50+ sub-sections.
Interesting hack: I think many projects (including us) can get away with generating the code for extraction since the per-website structure rarely changes.
So, we're looking for LLM to generate a code to parse HTML.
Happy to chat/share our findings if anyone is interested: artem [at] newscatcherapi.com
I think NewsCatcher (my YC startup) and Exa aren’t direct competitors but we definitely share the same insight — SERP is not the right way to let LLM interact with web. Because it’s literally optimized for humans who can open 10 pages at most.
What we found is that LLMs can sift through 10k+ web pages if you pre-extract all the signals out of it.
But we took a bit of a different angle. Even though we have over 1.5 billion of news stories only in our index we don’t have a solution to sift through as your Websets do (saw your impressive GPU cluster :))
So what we do instead is we do bespoke pipelines for our customers (who are mostly large enterprise/F1000). So we fine-tune LLMs on specific information extraction with very high accuracy.
Our insight: for many enterprises the solution should be either a perfect fit or nothing. And that’s where they’re ok to pay 10-100x for the last mile effort.
P.S. Will, loved your comment on a podcast where you said Exa can be used to find a dating partner.
reply