Hacker News new | past | comments | ask | show | jobs | submit | artembugara's comments login

Will, Jeff, I am a BIG Exa fan. Congrats on finally doing your HN Launch.

I think NewsCatcher (my YC startup) and Exa aren’t direct competitors but we definitely share the same insight — SERP is not the right way to let LLM interact with web. Because it’s literally optimized for humans who can open 10 pages at most.

What we found is that LLMs can sift through 10k+ web pages if you pre-extract all the signals out of it.

But we took a bit of a different angle. Even though we have over 1.5 billion of news stories only in our index we don’t have a solution to sift through as your Websets do (saw your impressive GPU cluster :))

So what we do instead is we do bespoke pipelines for our customers (who are mostly large enterprise/F1000). So we fine-tune LLMs on specific information extraction with very high accuracy.

Our insight: for many enterprises the solution should be either a perfect fit or nothing. And that’s where they’re ok to pay 10-100x for the last mile effort.

P.S. Will, loved your comment on a podcast where you said Exa can be used to find a dating partner.


Thanks Artem! That makes sense to specialize for the biggest customers. Yes, a lot of problems in the world would be improved by better search, including dating.

Search the web is apparently using SERP.

It’s just breaks my head. We’ve build LLMs that can process millions of pages at a time. But what we give them is a search engine that is optimized for humans.

It’s like giving a humanoid robot access to a keyboard with a mouse to chat with another humanoid robot.

Disclaimer: I might be biased as we’re kind of building the fact search engine for LLMs.


No LLM can process millions of web pages. Maybe you're thinking of something else?


This is a problem I think about often. I’d be curious to know what kind of things you’ve learned / accomplished in that problem space so far.


What makes you think Claude is using a search engine optimized for humans?


Artem here, co-founder of NewsCatcher (YC S22), our data has been used for research.

Danny and team our old friends who are using our free/super-low pricing for academia and researchers.

AMA, or feel free to email [email protected]

https://www.newscatcherapi.com/free-news-api


Hey Artem, NewsCatcher has been a great resource in our news pipelines!


Co-founder of NewsCatcher (YC S22). There are some reasons for not having a dataset fully open sourced.

But we have free/very very low tiers for academia.

So in case you need access for your research, go to https://www.newscatcherapi.com/free-news-api

Or feel free to email me directly at [email protected]


Yeah the link you provide is when it’s already official. And I’m more curious about cases when president just say “I’m gonna tax ‘em”


I’m a YC founder who did 0 to 2M ARR in founder led sales with absolutely 0 sales background. I’m basically a self-learned coder who had to take CEO role, therefore doing sales.

I find this video about enterprise sales from Pete Koomen (YC Partner) to be the best summary:

https://youtu.be/0fKYVl12VTA?si=mkP3SIWHiv2Ha3iT


If you don't mind me asking, over what timeframe?


About 18 months after very modest Seed round. We were first bootstrapped and it took a year to go to 300r ARR after we started working full time


~6.5X in ~18 months put you pretty near, getting into hyper growth, in my book (~350ish%>) - very good work. Keep it up, and good luck!


How did you reach out to your initial customers? Did you use any advertisement platform? Cold email? LinkedIn?


I open-sourced pyGoogleNews and wrote a quick blog about how you can reverse engineer google news RSS to turn it into an RSS feed of any website that is supported by Google News

https://news.ycombinator.com/item?id=42343182

https://github.com/kotartemiy/pygooglenews


Wow, that's one of the most orange tag-rich posts I've ever seen.

We're doing a lot of tests with GPT-4o at NewsCatcher. We have to crawl 100k+ news websites and then parse news content. Our rule-based model for extracting data from any article works pretty well, and we never could find a way to improve it with GPT.

"Crawling" is much more interesting. We need to know all the places where news articles can be published: sometimes 50+ sub-sections.

Interesting hack: I think many projects (including us) can get away with generating the code for extraction since the per-website structure rarely changes.

So, we're looking for LLM to generate a code to parse HTML.

Happy to chat/share our findings if anyone is interested: artem [at] newscatcherapi.com


I’d love to look into this for a hobbyist project I’m working on. Wish you had self signup!


this! I've been following Kadoa since its very first days. Great team.


Wow, Kyle, you should have mentioned it earlier!

We've been working on this for quite a while. I'll contact you to show how far we've gotten


Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: