Hacker News new | past | comments | ask | show | jobs | submit login

I figured it out mostly from first principles. It's such a niche crawling method that was perfectly limited to my use-case, and there's alot to say. But the main idea is that you can inject a crawling script in the html of the site via a proxy you control. E.g proxy.yoursite.com?url=<SITE_YOU_WANT_TO_CRAWL>. Then once you've got the data you can call window.postMessage(data) to communicate with the main window.

It's somewhat similar to how browser proxies like: https://proxyium.com/ and https://www.proxysite.com/ fetch the html on your behalf.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: