More

artembugara · 2025-05-06T16:53:34 1746550414

Will, Jeff, I am a BIG Exa fan. Congrats on finally doing your HN Launch.

I think NewsCatcher (my YC startup) and Exa aren’t direct competitors but we definitely share the same insight — SERP is not the right way to let LLM interact with web. Because it’s literally optimized for humans who can open 10 pages at most.

What we found is that LLMs can sift through 10k+ web pages if you pre-extract all the signals out of it.

But we took a bit of a different angle. Even though we have over 1.5 billion of news stories only in our index we don’t have a solution to sift through as your Websets do (saw your impressive GPU cluster :))

So what we do instead is we do bespoke pipelines for our customers (who are mostly large enterprise/F1000). So we fine-tune LLMs on specific information extraction with very high accuracy.

Our insight: for many enterprises the solution should be either a perfect fit or nothing. And that’s where they’re ok to pay 10-100x for the last mile effort.

P.S. Will, loved your comment on a podcast where you said Exa can be used to find a dating partner.

willbryk · 2025-05-06T21:20:58 1746566458

Thanks Artem! That makes sense to specialize for the biggest customers. Yes, a lot of problems in the world would be improved by better search, including dating.

artembugara · 2025-03-20T19:06:34 1742497594

Search the web is apparently using SERP.

It’s just breaks my head. We’ve build LLMs that can process millions of pages at a time. But what we give them is a search engine that is optimized for humans.

It’s like giving a humanoid robot access to a keyboard with a mouse to chat with another humanoid robot.

Disclaimer: I might be biased as we’re kind of building the fact search engine for LLMs.

sadeshmukh · 2025-03-20T19:26:22 1742498782

No LLM can process millions of web pages. Maybe you're thinking of something else?

braebo · 2025-03-20T19:27:47 1742498867

This is a problem I think about often. I’d be curious to know what kind of things you’ve learned / accomplished in that problem space so far.

ordersofmag · 2025-03-21T00:16:55 1742516215

What makes you think Claude is using a search engine optimized for humans?

artembugara · 2025-02-11T20:10:34 1739304634

Artem here, co-founder of NewsCatcher (YC S22), our data has been used for research.

Danny and team our old friends who are using our free/super-low pricing for academia and researchers.

AMA, or feel free to email [email protected]

https://www.newscatcherapi.com/free-news-api

dantheman252 · 2025-02-11T20:36:14 1739306174

Hey Artem, NewsCatcher has been a great resource in our news pipelines!

artembugara · 2025-02-11T20:07:30 1739304450

Co-founder of NewsCatcher (YC S22). There are some reasons for not having a dataset fully open sourced.

But we have free/very very low tiers for academia.

So in case you need access for your research, go to https://www.newscatcherapi.com/free-news-api

Or feel free to email me directly at [email protected]

artembugara · 2025-01-29T17:33:32 1738172012

Yeah the link you provide is when it’s already official. And I’m more curious about cases when president just say “I’m gonna tax ‘em”

artembugara · 2025-01-03T14:55:44 1735916144

I’m a YC founder who did 0 to 2M ARR in founder led sales with absolutely 0 sales background. I’m basically a self-learned coder who had to take CEO role, therefore doing sales.

I find this video about enterprise sales from Pete Koomen (YC Partner) to be the best summary:

https://youtu.be/0fKYVl12VTA?si=mkP3SIWHiv2Ha3iT

neom · 2025-01-03T15:49:09 1735919349

If you don't mind me asking, over what timeframe?

artembugara · 2025-01-03T17:50:30 1735926630

About 18 months after very modest Seed round. We were first bootstrapped and it took a year to go to 300r ARR after we started working full time

neom · 2025-01-03T17:58:34 1735927114

~6.5X in ~18 months put you pretty near, getting into hyper growth, in my book (~350ish%>) - very good work. Keep it up, and good luck!

cmdtab · 2025-01-03T19:36:22 1735932982

How did you reach out to your initial customers? Did you use any advertisement platform? Cold email? LinkedIn?

artembugara · 2024-12-06T23:33:28 1733528008

I open-sourced pyGoogleNews and wrote a quick blog about how you can reverse engineer google news RSS to turn it into an RSS feed of any website that is supported by Google News

https://news.ycombinator.com/item?id=42343182

https://github.com/kotartemiy/pygooglenews

artembugara · 2024-09-03T09:05:05 1725354305

Wow, that's one of the most orange tag-rich posts I've ever seen.

We're doing a lot of tests with GPT-4o at NewsCatcher. We have to crawl 100k+ news websites and then parse news content. Our rule-based model for extracting data from any article works pretty well, and we never could find a way to improve it with GPT.

"Crawling" is much more interesting. We need to know all the places where news articles can be published: sometimes 50+ sub-sections.

Interesting hack: I think many projects (including us) can get away with generating the code for extraction since the per-website structure rarely changes.

So, we're looking for LLM to generate a code to parse HTML.

Happy to chat/share our findings if anyone is interested: artem [at] newscatcherapi.com

AbstractH24 · 2024-09-03T11:04:00 1725361440

I’d love to look into this for a hobbyist project I’m working on. Wish you had self signup!

artembugara · 2024-09-03T08:49:48 1725353388

this! I've been following Kadoa since its very first days. Great team.

artembugara · 2024-09-03T08:29:21 1725352161

Wow, Kyle, you should have mentioned it earlier!

We've been working on this for quite a while. I'll contact you to show how far we've gotten